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(57) Abstract: This invention provides a method of sorting genes comprising: (1) preparing ds cDNA molecules from mRNA 
molecules; (2) digesting the ds cDNA molecules; (3) ligating to the digested cDNA molecules a set of dsDNA oligonucleotide adap- 
tors; (4) amplifying the ligated cDNA molecules; and (5) sorting the amplified cDNA molecules into non-redundant groups. This 
invention also provides two additional methods of sorting genes. This invention further provides a method of making sub-libraries 
of ligation sets and a method of making sub-libraries of genetic vectors. 
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5 TITLE 

SEQUENCE-DEPENDENT GENE SORTING TECHNIQUES 
FIELD OF THE INVENTION 

The present invention discloses techniques for simply and efficiently sorting 
expressed genes into non-redundant groups of cDNA molecules reverse-transcribed 
10 from any source of eukaryotic RNA. These groups of cDNA molecules can 

themselves be used for genetic analyses according to methods in the art, or they can be 
.. further sorted according to the techniques of the present invention. By applying these 
techniques one can obtain a collection of non-redundant subgroups of cDNA 
molecules, with every expressed-gene transcript from the original mRNA sample 
15 uniquely represented in its own subgroup. The method further provides a stage in 
which each expressed-gene transcript is found in one tube, i.e. "one gene per well." 
Uses of the present invention include isolation, identification and analysis of genes, 
analysis and diagnosis of disease states, study of cellular differentiation, and gene 
therapy. 

20 BACKGROUND OF THE INVENTION 

The production of cDNA or gene libraries has involved cloning by the use of 
cloning vectors placed in host organisms such as bacteria or yeast. These libraries 
suffer from redundancy: they contain either multiple copies of particular cDNA 
sequences, or multiple cDNA fragments from each expressed gene, or both. This 

25 redundancy persists in all of current normalization procedures. The presence in a 
collection of cDNAs of multiple copies of particular cDNA sequences, and/or 
multiple cDNA fragments from each expressed gene, can result in pointless 
duplication of research efforts and other significant inefficiencies. 

U.S. Patent No. 5,707,807 concerns the creation of subgroups of DNA by 

30 repeated digestions with a number of restriction enzymes, followed by ligation with 
adaptors having a common primer template, PCR amplification and, finally, 
comparison of patterns of PCR products separated by polyacrylamide-gel 
electrophoresis. The method of this patent creates groups of DNA molecules. 
However, because each PCR step indiscriminately amplifies all ligated DNA 

35 molecules in each sample, the method has a limited capacity to sort DNA into non- 
redundant groups. 
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5 Unrau and Deugau (1994) Gene 145:163-169 concerns characterizing 

fragments of digested DNA by the sequences of their cohesive ends and their 
lengths, optionally aided by PGR. However, each PGR step indiscriminately 
ampUfies all ligated DNA molecules in each sample, and amplifies numerous DNA 
fragments per gene. The method does not yield non-redundant groups of genes. * 

10 U.S. Patent No. 5,728,524 concerns obtaining groups of DNA molecules by 

using pools of adaptors ligated to digested DNA, followed by PGR. Each PGR step 
amplifies numerous DNA fragments per gene. The method fails to produce non- 
redundant groups of genes. 

Smith concerns a general method for PGR amplification of type n restriction 

15 fragments by ligation of adaptors with degenerate end sequences complementary to 
cohesive ends of digested DNA fragments. Each PGR step amplifies numerous 
DNA fragments per gene. The method fails to produce non-redundant gene groups. 

U.S. Patent No. 5,871,697 concems classifying DNA sequences by making 
extensive use of comparative databases and fragment-length and restriction-digest 

20 information. The patent concems DNA digestion and ligation of adaptors with 

priming sequences specific for a particular restriction enzyme. The method in this 
patent does not aim at the production of non-redundant groups of genes. 

Throughout this application, various references are cited author by and 
publication date. Each of these publications and each of the documents cited in each 

25 of these publications, and each document referenced or cited in the publication cited 
docimients are hereby incorporated herein by reference. 
OBJEGTS AND SUMMARY OF THE INVENTION 

The present invention provides novel methods for producing a non-redundant 
cDNA or gene hbrary. The methods sort DNA on a sequence-dependent basis into 

30 non-redundant groups. At the same time, however, these methods eliminate the 

need to determine any of the DNA sequences prior to sorting and identifying genes. 

One object of the present invention to provide a method of sorting cDNA or 
genes into non-redundant groups, which can then be analyzed by techniques known 
the art. One of many such techniques is the cDNA microarray method in which the 

35 cDNA clones derived from the present invention are used to produce the array that is 
then examined by hybridization to determine differential gene expression. 
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5 Another technique is differential display of gel-electrophoresis patterns 

involving mRNA sources to analyze biological models such as disease states or 
cellular differentiation. In application of this technique the groups derived from the 
present invention can be used for differential display of gel-electrophoresis patterns. 
Another object of the present invention is providing a method of obtaining a 
10 collection of non-redundant subgroups of cDNA molecules, with every expressed- 
gene transcript from an original mRNA sample uniquely represented in its own 
subgroup, i.e. "one gene per well." Such isolated genes have a wide-variety of uses, 
notably including gene therapy and analysis of the human genome. 

The present invention provides a method of sorting genes and/or gene 
15 fragments comprising the following steps (herein called "Method I"): 

(1) preparing ds cDNA molecules from mRNA molecules by reverse 
transcription, using a poly-T primer optionally having a general primer- 
template sequence upstream from the poly-T sequence, yielding ds cDNA 
molecules having the poly-T sequence, optionally having the general primer- 

20 template sequence; 

(2) digesting the ds cDNA molecules with a restriction enzyme that 
produces digested cDNA molecules with cohesive ends having overhanging 
ssDNA sequences of a constant number of arbitrary nucleotides; 

(3) ligating to the digested cDNA molecules a set of dsDNA 

25 oligonucleotide adaptors, each of which adaptor has at one of its ends a 

cohesive-end ssDNA adaptor sequence complementary to one of the possible 
overhanging ssDNA sequences of the digested cDNA, at the opposite end a 
specific primer-template sequence specific for the ssDNA adaptor 
complementary sequence, and in between the ends a constant sequence that 

30 is the same for all of the different adaptors of the set; 

(4) amplifying by separate polymerase chain reactions (PCRs) the ligated 
cDNA molecules, utilizing for each separate PGR a primer that anneals to the 
cDNA poly-T sequence optionally having the cDNA general primer- 
template, and a primer from a set of different specific primers that anneal to 

35 the cDNA specific primer-template sequences; and 
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5 (5) sorting the amplified cDNA molecules into non-redundant groups by 

collecting the amplification products after each separate PGR, each group of 
amplified cDNA molecules determined by the specific primer that annealed 
to the specific primer-template sequence and primed the PGR. 
One embodiment of the present invention according to the principles of 
10 Method I, comprises a complete set of oligonucleotide adaptors and specific primers, 
containing an oligonucleotide adaptor and a specific primer complementary to each of 
the possible overhanging ssDNA sequences of the digested cDNA. 

Another embodiment of the present invention according to the principles of 
Method I further comprises: 
15 (6) amplifying the sorted, non-redundant groups of cDNA molecules by 

nesting PGR, each amplification utilizing a primer that anneals to the cDNA 
poly-T sequence optionally having the cDNA general primer-template 
sequence, as well as one of a set of nesting primers with the following 
general formula: 

20 5 ~|sequence complementary to the constant sequence of the 

oUgonucleotide adaptors|-NIx-|l-5 nucleotides 
complementary to one of the possible sequences of 1-5 
nucleotides immediately upstream from the overhanging 
ssDNA sequence on the cDNA|-3' where N is an arbitrary 
25 nucleotide; I is inosine; and x=l,2,3 or 4, being one fewer 

than the constant number of nucleotides in the overhanging 
ssDNA sequences; and 
(7) sorting the amplified cDNA molecules into non-redundant subgroups 
by collecting the amplification products after each separate nesting PGR, 
30 each non-redundant subgroup of cDNA molecules determined by the 

particular nested primer that complemented the 1-5 nucleotides immediately 
upstream from the overhanging ssDNA sequence on the cDNA. 
Another embodiment of the present invention according to the principles of 
Method I further comprises conducting further PGRs with further nesting primers 
35 complementary to the next immediately upstream cDNA nucleotides, thereby sorting 
the amplified cDNA molecules further into non-redundant subgroups. 
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5 A preferred embodiment according to the principles of Method I further 

comprises conducting further PCRs with further nesting primers complementary to the 
next immediately upstream cDNA nucleotides until each non-redundant subgroup 
contains only one type of cDNA molecule, with every expressed-gene transcript in the 
mRNA sample uniquely represented in one of the non-redundant subgroups. 
10 The present invention also concerns a method of sorting genes and/or gene 

fragments comprising the following steps (herein called "Method 11"): 

(1) preparing ds cDNA molecules from mRNA molecules by reverse 
transcription, using a poly-T primer optionally having a general primer- 
template sequence upstream from the poly-T sequence, yielding ds cDNA 

15 molecules having the poly-T sequence, optionally having the general primer- 

template sequence; 

(2) digesting the ds cDNA molecules with a first restriction enzyme that 
produces digested cDNA molecules with cohesive ends having first 
overhanging ssDNA sequences of a constant number of arbitrary nucleotides; 

20 (3) ligating to the digested cDNA molecules a set of dsDNA 

oligonucleotide adaptors, each of which adaptor has at one of its ends a 
cohesive-end ssDNA adaptor sequence complementary to one of the possible 
first overhanging ssDNA sequences of the digested cDNA, at the opposite 
end a specific primer-template sequence specific for the ssDNA adaptor 

25 complementary sequence, and in between the ends a constant sequence that 

is the same for all of the different adaptors of the set, and that contains a 
recognition site for a second restriction enzyme that can cleave the ligated 
cDNA molecules at a point further from the Ugated oligonucleotide adaptor 
than the overhanging ssDNA sequences of the digested cDNA, and can 

30 create cohesive ends having second overhanging ssDNA sequences of a 

constant number of arbitrary nucleotides; 

(4) amplifying by separate PCRs, the ligated cDNA molecules, utilizing 
for each separate PGR a primer that anneals to the cDNA poly-T sequence 
optionally having the cDNA general primer-template, and a primer from a 
35 set of different specific primers that anneal to the cDNA specific primer- 

template sequences; and 
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5 (5) sorting the amplified cDNA molecules into non-redundant groups by 

collecting the amplification products after each separate PGR, each group of 
amplified cDNA molecules determined by the specific primer that aimealed 
to the specific primer-template sequence and primed the PGR. 
One embodiment of the present invention according to the principles of 
10 Method n comprises using a complete set of oligonucleotide adaptors and specific 
primers, containing an oligonucleotide adaptor and a specific primer complementary 
to each of the possible first overhanging ssDNA sequences of the digested cDNA. 

Another embodiment of the present invention according to the principles of 
Method n further comprises 
15 (6) digesting the sorted non-redundant groups of cDNA molecules with 

the second restriction enzyme, cleaving the ligated cDNA molecules at a 
point further from the ligated oligonucleotide adaptor than the overhanging 
ssDNA sequences of the digested cDNA, and creating cohesive ends having 
second overhanging ssDNA sequences of a constant number of arbitrary 
20 nucleotides; 

(7) ligating to the digested cDiSTA molecules a set of nesting dsDNA 
oligonucleotide adaptors, each of which adaptor has at one of its ends a 
cohesive-end ssDNA adaptor sequence complementary to one of the possible 
second overhanging ssDNA sequences of the digested cDNA, at the opposite 

25 end a specific primer-template sequence unique for the ssDNA adaptor 

complementary sequence, and in between the ends a constant sequence that 
is the same for all of the different adaptors of the set, and that contains the 
recognition site for the second restriction enzyme; 

(8) amplifying by separate PGRs, the ligated cDNA molecules, utilizing 
30 for each separate PGR a primer that anneals to the cDNA poly-T sequence 

optionally having the cDNA general primer-template, and a primer from a 
set of different specific primers that anneal to the cDNA specific primer- 
template sequences; and 

(9) sorting the amplified cDNA molecules into non-redundant subgroups 
35 by collecting the amplification products after each separate PGR, each 
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5 subgroup of amplified cDNA molecules determined by the specific primer 

that annealed to the specific primer-template sequence and primed the PGR. 
One embodiment of the present invention according to the principles of 
Method n further comprises using a complete set of nesting dsDNA oligonucleotide 
adaptors, containing an oligonucleotide adaptor complementary to each of the possible 
10 second overhanging ssDNA sequences of the digested cDNA. 

Another embodiment according to the principles of Method n further 
comprises conducting further PCRs using further nesting oligonucleotide adaptors, 
optionally with different restriction enzymes and recognition sites, thereby sorting the 
ampUfied cDNA molecules further into non-redundant subgroups. 
15 A preferred embodiment according to the principles of Method n further 

comprises conducting further ligations with further nesting oligonucleotide adaptors, 
optionally with different restriction enzymes and recognition sites, until each non- 
redundant subgroup contains only one type of cDNA molecule, with every expressed- 
gene transcript in the mRNA sample uniquely represented in one of the non-redundant 
20 subgroups. 

The present invention also provides a method (Method IH) of sorting genes 
and/or gene fragments comprising the steps of: 

(1) preparing ds cDNA molecules from mRNA molecules by reverse 
transcription, using a poly-T primer having a general primer-template 

25 sequence upstream from the poly-T sequence that includes a recognition 

sequence for a restriction enzyme, yielding ds cDNA molecules having the 
poly-T sequence, having the general primer-template sequence; 

(2) dividing the cDNA into N pools, wherein N is 1 to 25, by digesting 
the ds cDNA molecules with different restriction enzymes that produce 

30 digested cDNA molecules with cohesive ends having overhanging ssDNA 

sequences of a constant number of arbitrary nucleotides; 

(3) ligating to the digested cDNA molecules of each pool a set of dsDNA 
oligonucleotide adaptors, each of which adaptor has at one of its ends a 
cohesive-end ssDNA adaptor sequence complementary to one of the possible 

35 overhanging ssDNA sequences of the digested cDNA, at the opposite end a 

specific primer-template sequence specific for the ssDNA adaptor 
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5 complementary sequence, and in between the ends a constant sequence that 

is the same for all of the different adaptors of the set; 

(4) amplifying by separate PCRs the hgated cDNA molecules of each 
pool, utilizing for each separate PGR a primer that anneals to the cDNA 
poly-T sequence optionally having the cDNA general primer-template, and a 

10 primer from a set of different specific primers that anneal to the cDNA 

specific primer-template sequences; 

(5) sorting the amplified cDNA molecules from each pool into non- 
overlapping groups by collecting the amplification products after each 
separate PGR, each group of amplified cDNA molecules determined by the 

15 specific primer that annealed to the specific primer-template sequence and 

primed the PGR, wherein each of the restriction enzymes digests the N 
separate cDNA pools into 64 or 256 non-redundant sub-groups; and 

(6) digesting cDNA fragments in each non-redundant sub-group of the 
cDNA pools with different restriction enzymes and further purifying the 

20 digested cDNA fragments by removing the small end fragments produced by 

the digestion. 

This invention also provides a method of making sub-libraries of ligation 
sets by ligating restriction enzyme digested fragments generated by method III into a 
plasmid vector that have recognition sequence for said restriction enzymes and 
25 predigesting with these enzjones to make 64xN or 256xN sets of ligations, wherein 
Nisi to 25. 

This invention further provides a method of making sub-libraries of bacterial 
colonies, wherein the set of ligations, generated in the method of making sub- 
libraries of ligation sets, are transformed into bacteria and plated onto bacterial 

30 growth plates to produce bacteria colonies containing each of the 64xN or 256xN 
non-redundant subgroups of cDNA fragments, wherein N is 1 to 25. 

In one embodiment of method IQ, N is two and the restriction enzyme in step 
(1) comprises A^cl or another similar rare restriction enzyme. 
BRIEF DESCRIPTION OF THE DRAWINGS 

35 The following Detailed Description, given by way of example, but not 

intended to limit the invention to specific embodiments described, may be 
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5 understood in conjunction with the accompanying Figures, incorporated herein by 
reference, in which: 

Figure 1 schematically illustrates one embodiment. Primer Nesting Option, 
of the principles of Method 1 showing a flow chart using a specific sequence as an 
example (SEQ ID NOs: 36-48). 

i 0 Figure 2 schematicaUy illustrates one embodiment. Ligation Nesting Option 

[64-256-16], of the principles of Method II showing a flow chart using a specific 
sequence as an example. The first four steps are shown in Figure 1 (SEQ ID NOs: 
49-62). 

Figure 3 schematically illustrates an alternative embodiment. Ligation 
1 5 Nesting Option [64-64-64], of the prhxciples of Method II showing a flow chart 

using a specific sequence as an example. The first four steps are shown in Figure 2 
(SEQ ID NOs: 63-76). 

Figure 4 shows ligation specificity permitting isolation of the rat albumin 
gene using A. standard ligation conditions and B. the methods of the present 
20 invention. 

Figure 5 shows ligation specificity using human GAPDH gene wdth a 
particular set of ligation adaptors using the methods of the present invention. The 
results are shown as GAPDH-rev PGR analysis of GAPDH ligation specificity. ' 

Figure 6 shows PGR amplification products derived firom Jurkat-cell mRNA 

25 using a particular set of hgation adaptors according to the methods of the present 

invention. The double stranded cDNA (derived fi-om Jurkat cells) that was ligated to 
the mix of all 64 "Tail adaptor set 1" adaptors was used as template. The cDNA 
group ligated to each adaptor was amplified separately using the specific Tail primer 
and the END primer. The figure shows the products of all 64 Tail-END 

30 amplification reactions. Amplification products were separated on a 1 .5% agarose 
gel and ethidium bromide staining was used to visualize the DNA. 

Figrnre 7 is a Southern blot of PGR amplification products derived from 
Jurkat-cell mRNA showing ligation specificity according to the methods of the 
present invention. The agarose gei shown in Figure 2 was blotted onto nylon 

35 membrane (Nyti-an, Schleicher 8c Schuell). The membrane was then hybridized with 
a radioactive (^^P) probe specific to the human GAPDH gene. The specific signal 

RECTIFIED SHEET (RULE 91) 
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5 was obtained in the con*ect ''AGG" lane only. A weaker signal, observed in the 

"CCC" lane is not of the correct size and can be caused by sp\irious amplification of 
the abundant GAPDH cDNA by the "CCC" Tail primer alone. 

Figure 8 shows isolation of three different genes obtained by using the first 
nesting PGR primers according to the methods of the present invention. The 
1 0 specific END-Tail groups, expected to contain the GAPDH, KU autoantigen and 
fibrillarin cDNAs, were used as a template for nesting PGR. Nesting primers from 
the "1^^ Nest 256", expected to amplify these three genes were used. Amplification 
products were separated on a 1 .5% agarose gel and ethidium bromide staining was 
used to visualize the DNA, For the GAPDH and KU antigen cDNAs single bands of 
15 the correct size are observed. For fibrillarin cDNA three bands are observed, one of 
them, the middle 650bp band, is of the expected size. 

Figure 9 shows isolation of three different genes obtained by using second 
nesting primers according to the methods of the present invention- Products of the 
1 nesting reactions were used as template for the second nesting. Primers from the 
20 "2"^ nest 1 6" set were chosen that are expected to amplify the three cDNAs. As 
expected, single strong bands were obtained for GAPDH and KU autoantigen 
cDNAs, For fibrillarin, the second nesting step separated the three bands and only 
the correct 650bp band was obtained. 

Figure 1 0 shows the general structure of the primer and the primer set J2 
25 (SEQ ID NOs: 77- 141). 

Figure 1 1 shows tail primers set J2 (SEQ ID NOs: 142-205). 
Figure 12 shows tail primers (set number 2) (SEQ ID NOs: 206-265) 
Figure 13 shows tail primers set 256 (SEQ ID NOs: 266-521). 
Figure 14 shows first nesting primers 256 for tail adaptor 64 set 1 (SEQ ID 
522-777). 

Figure 1 5 shows first nesting primers 64 for tail adaptor 64. set 1 (SEQ ID 
778-841). 

Figure 1 6 shows second nesting primers 64 for tail adaptor 64 set 1 (SEQ ID 
842-905). 

Figure 17 shows second nesting primers 16 for tail adaptor 64 set 1 (SEQ ID 
906-921). 

RECTIFIED SHEET (RULE 91) 
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5 Figure 1 8 shows tail adaptors set 256 (SEQ ID NOs: 922-1 1 77). 

Figure 19 shows tail adaptors 64 (set number 1 (SEQ ID NOs: 1178-1241)) 
and helper oligonucleotides (SEQ ID NOs: 11 42-1 144). 

Figure 20 shows tail adaptors 64 (set number 2 (SEQ ID NOs: 1245-1308) 
and helper oligonucleotides (SEQ ID NOs: 1309-131 1). 
1 0 DETAILED DESCRIPTION OF THE INVENTION 

The present invention provides techniques for obtaining groups of non- 
redundant cDNA molecules, including cDNA libraries containing ''one gene per 
well" for every gene transcript present in an original mRNA source. These 
techniques sort DNA on a sequence-dependent basis into non-redundant groups, 
1 5 using PGR combined with (1) an initial step of ^'differential ligation" using a pool of 
dsDNA ligation adaptors, each of which has an arbitrary ssDNA end and a primer 
template specific for the ssDNA end, and optional further steps using (2) either 
nesting primers (in Method I) or nesting ligation adaptors (in Method II). 

Method I broadly concerns a method of sorting genes and/or gene fragments 
20 comprising the following steps: 

(1) preparing ds cDNA molecules from mRNA molecules by reverse 
transcription, using a poly-T primer optionally having a general primer- 
template sequence upstream from the poly-T sequence, yielding ds cDNA 
molecules having the poly-T sequence, optionally having the general primer- 

25 template sequence; 

(2) digesting the ds cDNA molecules with a restriction enzyme that 
produces digested cDNA molecules with cohesive ends having overhanging 
ssDNA sequences of a constant number of arbitrary nucleotides; 

(3) ligating to the digested cDNA molecules a set of dsDNA 

30 oligonucleotide adaptors, each of which adaptor has at one of its ends a 

cohesive-end ssDNA adaptor sequence complementary to one of the possible 
overhanging ssDNA sequences of the digested cDN A, at the opposite end a 
specific primer-template sequence specific for the ssDNA adaptor 
complementary sequence, and in between the ends a constant sequence that 

35 is the same for all of the different adaptors of the set; 
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5 (4) amplifying by separate PCRs the ligated cDNA molecules, utilizing 

for each separate PGR a primer that anneals to the cDNA poly-T sequence 
optionally having the cDNA general primer-template, and a primer from a 
set of different specific primers that anneal to the cDNA specific primer- 
template sequences; and 
10 (5) sorting the amplified cDNA molecules into non-redundant groups by 

collecting the ampUfication products after each separate PGR, each group of 
amplified cDNA molecules determined by the specific primer that aimealed 
to the specific primer-template sequence and primed the PGR. 
The restriction enzyme can be any enzyme that produces digested cDNA 
15 molecules with cohesive ends having overhanging ssDNA sequences of a constant 
number of arbitrary nucleotides. Such restriction enzymes include type Us 
restriction enzymes, including Bbvl, BspMl, Fold, Hgal, Mbol and SfaNL. Suitable 
type n restriction enzymes include BglL, BstXl and Sfil, 

The groups of cDNA molecules produced by the techniques of Method I are 
20 non-redundant: only one DNA sequence will be present for each gene, since for each 
gene only the poly-T-containing fragment — ^possibly the entire gene — ^is primed and 
amplified. As used in this invention, all genes present as transcripts in a mRNA 
sample were obtained using complete sets of redundant adaptors. Thus, one 
embodiment according to the principles of Method I comprises using a complete set of 
25 oligonucleotide adaptors and specific primers, containing an oligonucleotide adaptor 
and a specific primer complementary to each of the possible overhanging ssDNA 
sequences of the digested cDNA. If the constant number of arbitrary nucleotides in 
the overhanging ssDNA is 3, then a complete set of adaptors includes 4 or 64 
different oligonucleotide adaptors. If the constant number of arbitrary nucleotides is 
30 4, then a complete set includes 4"^ or 256 different adaptors. 

Another embodiment of Method I utilizes adaptors with the 3 -most nucleotide 
of the ssDNA complementary sequence of the oligonucleotide adaptor an arbitrary 
nucleotide N, which pairs with the 5 '-most nucleotide of each of the possible 
overhanging ssDNA sequences of the digested cDNA. A complete set of this kind of 
35 adaptors contains an oligonucleotide adaptor (for a specific primer) complementary to 
each of the possible overhanging ssDNA sequences of the digested cDNA excluding 
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5 the 5 '-most nucleotide that pairs with the arbitrary nucleotide N of the oligonucleotide 
adaptor. 

One embodiment of the principles of Method I further comprises additional 

steps; 

(6) amplifying the sorted non-redundant groups of cDNA molecules by 
10 nesting PGR, each amplification utilizing a primer that anneals to the cDNA 

poly-T sequence optionally having the cDNA general primer-template 
sequence, as well as one of a set of nesting primers with the following general 
formula 

5 -[sequence complementary to the constant sequence of 
15 the oligonucleotide adaptors|-NIx-|l-5 nucleotides 

complementary to one of the possible sequences of 1-5 
nucleotides immediately upstream from the overhanging 
ssDNA sequence on the cDNA|-3' where N is an arbitrary 
nucleotide; I is inosine; and x=l,2,3 or 4, being one fewer 
20 than the constant number of nucleotides in the 

overhanging ssDNA sequences; and 

(7) sorting the amplified cDNA molecules into non-redundant subgroups 
by collecting the amplification products after each separate nesting PGR, each 
non-redundant subgroup^of cDNA molecules determined by the particular 

25 nested primer that complemented the 1-5 nucleotides immediately upstream 

from the overhanging ssDNA sequence on the cDNA. 

As before, a complete set of nesting primers can be used, which set contains a 
nesting primer complementary to each of the possible sequences of 1-5 nucleotides 
inoaxiediately upstream from the overhanging ssDNA sequence on the cDNA. 

30 The principles of Method I can be used to conduct further PGRs with further 

nesting primers complementary to the next immediately upstream cDNA nucleotides, 
thereby sorting the amplified cDNA molecules further into non-redundant subgroups. 
A preferred embodiment involves conducting further PGRs with further nesting 
primers complementary to the next immediately upstream cDNA nucleotides until 

35 each non-redundant subgroup contains only one type of cDNA molecule, with every 
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5 expressed-gene transcript in the mRNA sample uniquely represented in one of the 
non-redundant subgroups, i.e. "one gene per well." 

Method n broadly concerns a method of sorting genes and/or gene fragments 
comprising the following steps; 

(1) preparing ds cDNA molecules from n[iRNA molecules by reverse 
10 transcription, using a poly-T primer optionally having a general primer- 
template sequence upstream from the poly-T sequence, yielding ds cDNA 
molecules having the poly-T sequence, optionally having the general primer- 
template sequence; 

(2) digesting the ds cDNA molecules with a first restriction enzyme that 
15 produces digested cDNA molecules with cohesive ends having first 

overhanging ssDNA sequences of a constant number of arbitrary nucleotides; 

(3) ligating to the digested cDNA molecules a set of dsDNA 
oligonucleotide adaptors, each of which adaptor has at one of its ends a 
cohesive-end ssDNA adaptor sequence complementary to one of the possible 

20 first overhanging ssDNA sequences of the digested cDNA, at the opposite 

end a specific primer-template sequence specific for the ssDNA adaptor 
complementary sequence, and in between the ends a constant sequence that 
is the same for all of the different adaptors of the set, and that contains a 
recognition site for a second restriction enzyme that can cleave the ligated 

25 cDNA molecules at a point further from the ligated oligonucleotide adaptor 

than the overhanging ssDNA sequences of the digested cDNA, and can 
create cohesive ends having second overhanging ssDNA sequences of a 
constant number of arbitrary nucleotides; 

(4) ampUfying by separate PCRs the ligated cDNA molecules, utilizing 
30 for each separate PGR a primer that anneals to the cDNA poly-T sequence 

optionally having the cDNA general primer-template, and a primer from a 
set of different specific primers that anneal to the cDNA specific primer- 
template sequences; and 

(5) sorting the amplified cDNA molecules into non-redundant groups by 
35 collecting the ampUfication products after each separate PGR, each group of 
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5 amplified cDNA molecules deteraiined by the specific primer that annealed 

to the specific primer-template sequence and primed the PGR. 
The first restriction enzyme can be any that produces digested cDNA 
molecules with cohesive ends having overhanging ssDNA sequences of a constant 
number of arbitrary nucleotides. Such restriction enzymes include type lis 
10 restriction enzymes, including Bbvl, BspMl, Fokl, Hgal, Mbol and SfaNl. Suitable 
type n restriction enzymes include BglL, BstXl and SfiL. The second restriction 
enzyme can be a type n restriction enzyme that cleaves the ligated cDNA molecules 
at a point further from the ligated oligonucleotide adaptor than the overhanging 
ssDNA sequences of the digested cDNA, and creates cohesive ends having second 
15 overhanging ssDNA sequences of a constant number of arbitrary nucleotides. 
Examples of suitable type lis restriction enzymes include BspMl. 

As in Method I, in Method n a complete set of oligonucleotide adaptors and 
specific primers contains an oligonucleotide adaptor and a specific primer 
complementary to each of the possible first overhanging ssDNA sequences of the 
20 digested cDNA. Where the 3 -most nucleotide of the ssDNA complementary 

sequence of the oUgonucleotide adaptor is an arbitrary nucleotide N, which pairs 
with the 5 '-most nucleotide of each of the possible first overhanging ssDNA 
sequences of the digested cDNA, a complete set of oligonucleotide adaptors and 
specific primers contains an oligonucleotide adaptor and a specific primer 
25 complementary to each of the possible first overhanging ssDNA sequences of the 
digested cDNA excluding the 5 '-most nucleotide that pairs with the arbitrary 
nucleotide N of the oligonucleotide adaptor. 

One embodiment of the principles of Method n further comprises additional 

steps: 

30 (6) digesting the sorted non-redundant groups of cDNA molecules with 

the second restriction enzyme, cleaving the ligated cDNA molecules at a 
point further from the ligated oligonucleotide adaptor than the overhanging 
ssDNA sequences of the digested cDNA, and creating cohesive ends having 
second overhanging ssDNA sequences of a constant number of arbitrary 

35 nucleotides; 
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5 (7) ligating to the digested cDNA molecules a set of nesting dsDNA 

oligonucleotide adaptors, each of which adaptor has at one of its ends a 
cohesive-end ssDNA adaptor sequence complementary to one of the possible 
second overhanging ssDNA sequences of the digested cDNA, at the opposite 
end a specific primer-template sequence unique for the ssDNA adaptor 
10 complementary sequence, and in between the ends a constant sequence that 

is the same for all of the different adaptors of the set, and that contains the 
recognition site for the second restriction enzyme; 

(8) amplifying by separate PCRs the ligated cDNA molecules, utilizing 
for each separate PGR a primer that anneals to the cDNA poly-T sequence 

15 optionally having the cDNA general primer-template, and a primer from a 

set of different specific primers that anneal to the cDNA specific primer- 
template sequences; and 

(9) sorting the amplified cDNA molecules into non-redundant subgroups 
by collecting the amplification products after each separate PGR, each 

20 subgroup of amplified cDNA molecules determined by the specific primer 

that annealed to the specific primer-template sequence and primed the PGR. 
A complete set of nesting dsDNA ohgonucleotide adaptors contains an 
oUgonucleotide adaptor complementary to each of the possible second overhanging 
ssDNA sequences of the digested cDNA. 

25 An embodiment of Method n includes conducting further PGRs using further 

nesting oligonucleotide adaptors, optionally with different restriction enzymes and 
recognition sites, thereby sorting the amplified cDNA molecules further into non- 
redundaat subgroups. If different restriction enzymes are used, they must cleave the 
ligated cDNA molecules at a point further from the Ugated oUgonucleotide adaptor 

30 than the overhanging ssDNA sequences of the digested cDNA, and create cohesive 
ends having second overhanging ssDNA sequences of a constant number of arbitrary 
nucleotides. 

A preferred embodiment of Method n comprises repeating nesting ligation 
and PGR until each non-redundant subgroup contains only one type of cDNA 
35 molecule, with every expressed gene transcript in the mRNA sample uniquely 
represented in one of the non-redundant subgroups, i.e. "one gene per well." 
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5 Method IH utilizes the non-redundant groups of cDNA fragnaents collected in 

step (5) of Method I and II for the preparation of sets of non-redundant sub-hbraries of 
cDNA. Such sub-libraries can be more economically used for the derivation of a 
complete cDNA hbrary by selecting a group of clones from the sub-Ubraries. The 
principle of Method HI is that cDNA fragments derived from a specific highly 

10 abundant mRNA will converge into one group. Thus, a few groups will contain a 
highly redundant cDNA population. These groups are identified by analysis of the 
cDNA content of the group by sequencing or other methods. All other groups will be 
devoid of cDNAs of highly redundant mRNAs and thus of low redundancy and are 
used , in combination, to derive a full cDNA library. Since the elimination of the 

15 groups that contain a highly redundant cDNA population also removes some cDNA 
fragments of low redundancy mRNAs an approach involving parallel processing of 
two cDNA pools, each digested with a type different lis restriction enzyme, is used. 
This makes it highly improbable that a "rare" cDNA fragment will be found in a high- 
redundancy group in both digest pools, 

20 Method HI broadly concerns a method of sorting genes and/or gene 

fragments comprising the steps of : 

(1) preparing ds cDNA molecules from nciRNA molecules by reverse 
transcription, using a poly-T primer having a general primer-template 
sequence upstream from the poly-T sequence that includes a recognition 

25 sequence for a restriction enzyme, yielding ds cDNA molecules having the 

poly-T sequence, having the general primer-template sequence; 

(2) dividing the cDNA into N pools, wherein N is 1 to 25, by digesting 
the ds cDNA molecules with different restriction enzymes that produce 
digested cDNA molecules with cohesive ends having overhanging ssDNA 

30 sequences of a constant number of arbitrary nucleotides; 

(3) ligating to the digested cDNA molecules of each pool a set of dsDNA 
oligonucleotide adaptors, each of which adaptor has at one of its ends a 
cohesive-end ssDNA adaptor sequence complementary to one of the possible 
overhanging ssDNA sequences of the digested cDNA, at the opposite end a 

35 specific primer-template sequence specific for the ssDNA adaptor 
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5 complementary sequence, and in between the ends a constant sequence that 

is the same for all of the different adaptors of the set; 

(4) amplifying by separate PCRs the ligated cDNA molecules of each 
pool, utilizing for each separate PGR a primer that anneals to the cDNA 
poly-T sequence optionally having the cDNA general primer-template, and a 

10 primer from a set of different specific primers that anneal to the cDNA 

specific primer-template sequences; 

(5) sorting the amplified cDNA molecules from each pool into non- 
redundant groups by collecting the amplification products after each separate 
PGR, each group of amplified cDNA molecules determined by the specific 

15 primer that annealed to the specific primer-template sequence and primed the 

PGR, wherein each of the restriction enzymes digests the N separate cDNA 
pools into 64 or 256 non-redundant sub-groups; and 

(6) digesting cDNA fragments in each non-redundant sub-group of the 
cDNA pools with different restriction enzymes and further purifying the 

20 digested cDNA fragments by removing the small end fragments produced by 

the digestion. 

In one embodiment of the method of sorting genes and/or gene fragments, 
the method further comprises purifying the digested cDNA fragments by removing 
the small end fragments produced by the digestion. 
25 Methods I, n and HI reactions stop when the cDNAs are exhausted. 

In another embodiment of the method of sorting genes and/or gene 
fragments, the method further comprises ligating the digested cDNA fragments into 
a plasmid vector that has recognition sequence for a restriction enzyme and is 
predigested with the enzyme, producing a set of ligations. 
30 In another embodiment of the method of sorting genes and/or gene 

fragments, the restriction enzyme is Notl or A^-cI. 

In another embodiment of the method of sorting genes and/or gene 
fragments, the method further comprises ligating the digested cDNA fragments into 
a genetic vector. 
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5 In another embodiment of the method of sorting genes and/or gene 

fragments, the genetic vector is a viral vector, a bacterial vector, a protozoan vector, 
a retrotransposon, a transposon, a DNA vector, or a recombinant vector. 

In another embodiment of the method of sorting genes and/or gene 
fragments, the method further comprises transforming the ligation products into 
10 bacteria and growing the bacteria in a suitable growth media. 

In another embodiment of the method of sorting genes and/or gene 
fragments, the bacteria are grown on bacteria growth plates. 

In another embodiment of the method of sorting genes and/or gene 
fragments, N is two and the restriction enzymes of step (2) are Bbsl for one pool and 
15 Bsal for the second pool. 

In another embodiment of the method of sorting genes and/or gene 
fragments, N is 2 to 20, preferably 2 to 15, more preferably 2 to 10 and most 
preferably 2 to 4. 

In another embodiment of the method of sorting genes and/or gene 
20 fragments, N is two and the restriction enzyme in step (1) comprises Ascl or another 
sinGdlar rare restriction enzyme. 

In yet another embodiment of the method of sorting genes and/or gene 
fragments, N is two and the restriction enzyme in step (5) comprises Bbsl or Bsal, 
In a further embodiment of the method of sorting genes and/or gene 
25 fragments, N is two and the restriction enzyme in step (6) comprises NotI or Ascl, 

This invention also provides a method of making sub-libraries of ligation 
sets by ligating restriction enzyme digested fragments produced by the method of 
sorting genes and/or gene fragments, into a plasmid vector that have recognition 
sequence for said restriction enzymes and predigesting with these enzymes to make 
30 64xN or 256xN sets of hgations, wherein N is 1 to 25. 

This invention further provides a method of making sub-libraries of 
expression system colonies by transforming the set of ligations into an expression 
system to produce colonies of the expression system containing each of the 64 x N 
or 256 x N non-redundant subgroups of cDNA fragments, wherein N is 1 to 25. 
35 In one embodiment of the method of making sub-hbraries of expression 

system colonies, the expression system is a bacterium. 
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5 In another embodiment of the method of making sub-libraries of expression 

system colonies, the bacteria are grown under suitable conditions. 

In a further embodiment of the method of making sub-hbraries of expression 
system colonies, the bacteria are plated onto bacterial growth plates. 

The practice of the present invention employs, unless indicated, conventional 

10 techniques of molecular biology (including recombinant techniques), microbiology, 
cell biology, biochemistry and immunology, which are within the skill of the art. 
Such techniques are explained fully in the hteratures, such as, "Molecular Cloning: 
A Laboratory Manual", second edition (Sambrook et al. 1989); "Oligonucleotide 
Synthesis" (Gait, ed. 1984); "Animal Cell Culture" (Freshney ed., 1987) Met. 

15 Enzymol. (Academic Press, Inc.); "Handbook of Experimental Immunology" (Wei 
and Blackwell, eds.); "Gene Transfer Vectors for Mammalian Cells" (Miller and 
Calos eds. 1987); "Current Protocols in Molecular Biology" (Ausubel et al. eds. 
1987); "PGR: The Polymerase Chain Reaction" (MuUis et al. eds. 1994); and 
"Current Protocols in Immunology" (Coligan et al. eds. 1991). These techniques are 

20 applicable to the production of the polynucleotides of the invention, and, as such, 
may be considered in making and practicing the invention. This invention can be 
applicable to the uses disclosed in PCT publications, such as WO 98/51789A2, WO 
93/18176A1 and WO 99/60164. 

Reference is made to U.S. Patent Nos.: 5,407,813; 5,413,909; 5,487,985; 

25 5,508,169; 5,556,773; 5,580,726; 5,629,179; 5,650,274; 5,695,937; 5,700,644; 
5,710,000; 5,728,524; 5,763,239; 5,804,382; 5,814,445; 5,837,468; 5,858,656; 
5,863,722; 5,866,330; and 5,871,697; PCT publication WO 94/01582; Guilfoyle et 
al. (1997) Nucl. Acids Res. 25:1854-1858; Ivanova and Belyavsky (1995) Nucl. 
Acids Res, 23:2954-2958; Mahadeva et al. (1998) J. MoL Biol. 284:1391-1398; 

30 Troutt et al. (1992) Proc. Natl. Acad. Sci. USA 89:9823-9825; Kato (1995) Nucl. 

Acids Res. 23:3685-3690; Prashar and Weissman (1996) Proc. Natl. Acad. Sci. USA 
93:659-663; Ko Nucl. Acids Res. 18:5705-5711; Edward Nucl. Acids Res. 19:5227- 
5232; Hoog Nucl. Acids Res. 19:6123-6127; Sokolov et al. (1994) Nucl. Acids Res. 
22:4009-4015; Schmidt and Mueller Nucl. Acids Res. 24:1789-1791; Belyavsky et 

35 al. (1989) Nucl. Acids Res. 17:2919-2932; Calvet (1991) Ped. Nephrol. 5:751-757; 
Cooke et al. (1996) Plant J. 9:101-124; Domec et al. (1990) Anal. Biochem. 
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5 188:422-426; Haymerle et al. (1986) Nucl. Acids Res. 14:8615-8625; Kato et al. 
(1994) Gene 150:243-250; Kohchi et al. (1995) Plant J. 8:771-776; Patanjali et al. 
(1991) Proc. Natl. Acad. Sci. USA 88:1943-1947; Podhajska et al. (1992) Met. 
Enzymol. 216:303-309; and Szybalski et al. (1991) Gene 100:13-26; and the 
documents cited therein and the documents of record in the prosecution of cited U.S. 

10 patent; all of which are incorporated herein by reference. 

With respect to cDNAs for expression in a vector and documents providing 
such exogenous DNA, as well as with respect to the expression of transcription 
and/or translation factors for enhancing expression of nucleic acid molecule, 
reference is made to U.S. Patent No. 5,990,091, and WO 98/00166 and WO 

15 99/60164, and the documents cited therein and the documents of record in the 

prosecution of that patent and those PCT applications; all of which are incorporated 
herein by reference. Thus, U.S. Patent No. 5,990,091 and WO 98/00166 and WO 
99/60164 and documents cited therein and documents or record in the prosecution of 
that patent and those PCT applications, and other documents cited herein or 

20 otherwise incorporated herein by reference, can be consulted in the practice of this 
invention; and, all exogenous nucleic acid molecules and vectors cited therein can be 
used in the practice of this invention. In this regard, mention is also made of U.S. 
Patents Nos.: 6,004,777; 5,997,878; 5,989,561; 5,976,552; 5,972,597; 5,858,368; 
5,863,542; 5,833,975; 5,863,542; 5,843,456; 5,766,598; 5,766,597; 5,762,939; 

25 5,756,102; 5,756,101; and 5,494,807. The expression systems are disclosed in U.S. 
Patent Nos.: 5,538,885; 5,641,663; 5,830,692; and 6,004,941. 

As used herein, ;vectors include, but are not limited to, viral; bacterial; 
protozoan; DNA; retrotransposon; transposon; or a recombinant vector thereof. 

As used herein, "rare restriction enzymes" means restriction enzymes having 

30 a low chance of cleaving within the cDNA. Generally, enzymes that have 

recognition sequence of 8 or more base pairs are regarded as rare enzymes since 
they cleave, statistically, once every 4^ bp (--1 per 16,000 bp). 
EXPERDVIENT AL DETAILS 

The following examples illustrate some embodiments of the present 

35 invention in more detail. However, the following examples should not be construed 
as limiting the scope of the present invention. 
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5 Example 1 

cDNA Preparation 

Conversion of mRNA into ds cDNA 

For priming the synthesis of single stranded cDNA from poIyA-l- mRNA an 
oligo(dT) primer was used. The primer was of the following structure, including a 
1 0 general primer-template sequence: 

5'-TGCATGGCACAGTACTGAGTGGTATCGACTCGTACAGGCGC 
GCCTTTTTTTTTTTTTTTTTTV-S^ (SEQ ID NO: 1) (V=C, G or A). 

General primers for amplification from this sequence include GPI 1 , GPI2 and 
GPI3, SEQ ID NOs: 2, 3 and 4, respectively. 

15 TGCATGGGA CAGTACTGAGT 

C ACAGTACTGAGTGGTATCG 

AGTGGTATC GACTCGTACAG 
As depicted here, the three general primers are nested relative to each other. 

Conventional methods were used for preparation of double stranded cDNA 
20 from polyA+ mRNA. The double stranded cDNA was column purified (Qiagen - 
QIAquick PGR purification kit, catalogue no. 28106) to remove excess oHgo(dT) 
primer and nucleotides. 

ds cDNA digestion; restriction enzyme choice 

The double stranded cDNA was digested with a type Us restriction enzyme 
25 (RE) that produced a four base overhang structure and that cut at least 8 nucleotides 
away from the recognition sequence. Other enzymes, including type II restriction 
enzymes, that produce other overhangs or that cut closer to the recognition sequence 
can be used. REs used were: 

Bbvl 5^"GCAGCNNNNNNNN-3' (SEQ ID NO: 5) 
30 3'-CGTCGNNNNNNNNNNNN-5' (SEQ ID NO: 6) 

Fokl 5'-GGATGNNNNNNNNN-3^ (SEQ ID NO: 33) 
3'-CCTACNNNNNNNNNNNNN-5' (SEQ ID NO: 7) 
In the examples as shown in Figure 4, double stranded cDNA derived from 
rat liver mRNA was digested with Bbvl and ligated to Tail adaptor set 2. Helper 
35 oligonucleotide was HOLL2 for "Figure 4 A" and HOLLl for "Figure 4B'\ Ligation 
specificity was tested on the albumin gene, which constitutes the most abundant 
mRNA in rat liver. Ligation specificity was tested with a Tail primer and albumin 
reverse primer. The specific Tail adaptor that should ligate to the albumin gene is 
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5 Ada- ACT. Ail nine Tail adaptors that have one base mismatch with the albumin 
specific Tail adaptor were examined. Both 'Tignre 4A" and "Figure 4B" show 
separation of DNA on 1.5% agarose gel using and ethidium bromide staining was 
used to visualize the DNA. 

In Figure 4A, helper oligonucleotide HOLL2, a perfect match to the Tail 

10 adaptor, was used. Oligonucleotide concentration was 5pmol/25ial. The correct 
200bp band is observed in the specific ACT Tail primer. However, Tails ATT, 
AGT, ATT and TOT give a strong 200bp-albumxn band. A very weak 200bp band 
is observed in Tail ACG, ACC and ACA. Thus, the ligation conditions used here 
allow frequent mis-ligations. 

15 In Figure 4B, helper oligonucleotide HOLLl, that has a mismatch to the first 

nucleotide of the constant region of the Tail adaptor, was used. Oligonucleotide 
concentration was 2.5pmol/25|LiL The 200bp albimiin specific band is observed only 
in the Tail-ACT amplification. None of the other Tails gave the albumin band. The 
500bp band observed in the ACG lane (also seen in "'A") is caused by Tail-Tail 

20 amplification of an undetermined gene. Thus, the ligation conditions used here give 

highly specific ligation and do not allow mis- ligations. 

Example 2 
Differential Ligation 

Adaptor design and sequence 
25 The digested ds cDNA was ligated to a set of oligonucleotide adaptors. Two 

sets of adaptors were used: a set of 64 adaptors covering all 64 combinations of 

three of the foxir nucleotides of the overhang; and a set of 256 adaptors covering all 

256 combinations of the four nucleotides of the overhang. 

Each adaptor comprises two DNA strands: a "long" 49-5 1 bp strand that 
30 contain the sequence that fits into the overhang produced by the type lis RE*s; and a 

"shorf ' 1 8-mer strand that complements the long strand up to the overhang. Three 

structural versions of the short strand were examined: 

5'-XYZNGCAGGTACGTCGTACCGCGGCCGCGTGAGCTTGAGTC 
GCGTGGAT-3' long strand (SEQ ID NO: 8) 

35 3'-CGTCCATGCAGCATGGCG-5' short strand 1 (SSI) (SEQ ID NO: 9) 
3'-AGTCCATGCAGCATGGCGo' short strand 2 (SS2) (SEQ ID NO: 10) 
3^- CATCCATGCACCATGGCG-5^ short strand 3 (SS3) (SEQ ID NO: 1 1) 
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5 Note that SS2 has a mismatch to the 5th nucleotide of the long strand (just 

after the N) and that SS3 has a mismatch to the 6th nucleotide. 

The general structure of! the long strand of the adaptors is as follows for the 
set of 64: 

5T-XYZN — constant — | — Specific — • | 
10 where each of X, Y and Z can be any of the nucleotides but are specific for each 
adaptor; N is a mix of all 4 nucleotides; P is a 5' phosphate; the constant region is a 
sequence which is common to all 64 adaptors while the specific region is specific to 
each of the 64 adaptors; each adaptor has a different specific sequence. 

The general structure of the adaptors is as follows for the set of 256: 
1 5 5T-WX YZ constant ™ | Specific — | 

where each of W, X, Y and Z can be any of the nucleotides but are specific for each 
adaptor; P is a 5' phosphate; the constant region is a sequence which is common to 
all 256 adaptors while the specific region is specific to each of the 256 adaptors, 
each adaptor has a different specific sequence. For each adaptor from the set of 64 
20 and set of 256 a specific primer, complementary to the specific region of the 

adaptor, has been synthesized. Figure 1 8 shows tail adaptors set 256, which can be 
represented by such a general formula. 

The sequences of the entire sets of 64 and 256 adaptors can be generated 
firom the general stmctures for the set of 64 and the set of 256, respectively. The list 
25 of specific primers sets are shown in Figures 10 to 13. 

Figure 1 0 shows the general structure of the primer and the primer set J2 
having the general structure of the primers 

5' XYZN GCAGGT ACGTCGTACC GCGGCCGC-x-x-x-x-x-x-x-x-x-x-x-x-3' 
(SEQ ID NO: 12) 

30 Bases 4 BspMI(6) constant(lO) Notl(8) Tail (20) 

wherein X, Y and Z can be any of A, T, C or G. 

Figure 1 1 shows tail primers set J2 represented by the general formula: 
Tail-XYZ 5' TCCACGCGACTCAAGCTCAC (SEQ ID NO: 13) 
wherein X, Y and Z in the primer name can be any of A, T, C or G. The primer 
35 sequence is different for each of the 64 different tail primers and each one of them is 
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5 a complete reverse complement to the specific region of the tail adaptor that has the 
same X, Y, Z. 

Figure 12 shows tail primers (set number 2) represented by the general 
formula: TnewXYZ 5' AACGACGCGTCGCGGTACCAG (SEQIDNO: 14) 
wherein Y and Z can be any of A, T, C or G. 
10 Figure 13 shows tail primers set 256 represented by the general formula: 

TailWXYZ 5'AACGCAGTGTTCGTTCGACGA (SEQ ID NO: 15) wherein each 
of W, X, Y and Z can be any of A, T, C or G. 
Ligation procedxire 

For ligations, all 64 or 256 adaptors are mixed in equal molar concentrations, 

15 Initially ligation conditions followed conventional methods. This included the use 
of T4 DNA ligase at 1 6''C and using the SS 1 strand. These ligation conditions 
proved inadequate since ligation specificity was low; with adaptors ligating to 
unmatched overhangs (Figure 4a). 

The following conditions gave very high ligation specificity. 1 OOng digested 

20 ds cDNA was placed in ligation buffer (50mM Tris-HCl, pH 7.8; lOmM MgCh; 
lOmM dithiothreitol, 26|aM NAD+; 25jj.g/ml bovine semm albumin). Adaptor 
concentration was 2.5pmol/12fj.l (long strand at 2,5pmol/12|al and the short strand at 
10pmol/12^il). Importantly, short strand SS2 with one mismatch to the 5th 
nucleotide of the long strand (just after the N) was used. Other short strands always 

25 gave lower specificity. At this point reaction volume was 1 0|j.L The reaction was 
heated to 65°C for 5 minutes and then cooled to S'^C. 2pl ofE. coli DNA ligase 
( 1 Ounits/)xl) were added. 

Incubation was carried out for 1 2 hours. The reaction was stopped by 
heating to 65''C for 1 5 minutes and the reaction mix was stored at 4°C. Ligation 

30 products were column purified (QIAquick spin) to remove unligated adaptors. 

Example 3 
Analysis of ligation specificity 

Ligation specificity was tested on highly expressed genes. The following 

example details an experiment performed on mRNA from rat liver. The most 

35 abundant gene in this tissue is albumin and was selected (as well as other genes not 

shown here) to test ligation specificity. The type lis RE used was ^Z^vL The 3'-most 
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5 Bbvl site in the rat albumin gene (GenBank accession no, J00698) is at nucleotide 
1740, 250bp from the poly-A tail, 

A reverse oligonucleotide 5'-CACCAACAGAAGAGATGAGTCCTG-3' 
(SEQ ID NO: 16) matches nucleotides 1901 to 1881. The distance of this 
oligonucleotide from the Bbvl site is 1 60bp, 
10 The specific adaptor for ligation to this Bbvl end of the rat albumin gene is 

Ada-ACT: 3'AGATGCGGATCGGGCTCTGTGCGCCGGCGCCATGCTG 
CATGGACGNTCA5' (SEQ ID NO: 17) 

Amplification of the ligation product with specific- ACT 
(5»-TCTACGCCTAGCCCGAGACAC-3' (SEQ ID NO: 18)) by PGR gave the 

15 correct fragment size of 209bp on an agarose gel (Figure 4 lane ACT). Had a 

different adaptor managed to Hgate (i.e. mis-ligate) to the end of albumin cDNA, 
then a different specific primer would have given a fragment of the same size. 
Figure 4a shows the results of ligation done under non-specific conditions using a 
short strand with no mismatches. Lanes ATT, AGT, AAT, TCT show the presence 

20 of such a fragment after amplification with other tail primers indicating presence of 
mis-ligation. However, when the conditions defined above were used, no mis- 
Ugations occurred (Figure 4B). 

Additional experiments performed on the GAPDH sequence included testing 
ligation specificity on all 64 specific adaptors. Upon digestion with a type lis 

25 restriction enzyme of a double-stranded cDNA derived firom the mRNA of a specific 
gene, firagments with specific overhangs are produced. The example below 
describes the full human GAPDH cDNA sequence and the location of the 
recognition sites for the Bbvl type lis restriction enzyme. The "><" symbol marks 
the exact point were the enzyme cleaves the cDN A. The polyA addition signal 

30 (AATAAA), found 20 to 30 bases before the actual polyA addition site, is 

underlined. Also imderhned, in the more upstream regions, are the Bbvl recognition 
sequences. The example given here is in addition to the rat albumin example. 

GTTCGACAGTCAGCCGCATCTTCTTTTGCGTCGCCAGCCGAGCCACATCG 

CTCAGACACCATGGGGAAGGTGAAGGTCGGAGTCAACGGATTTGGTCGT 

35 ATTGGGCGCCTGGTCACCAGGGCTGCTTTTAACTCTGGTAAAGTGGATAT 

TGTTGCCATCAATGACCCCTTCATTGACCTCAACTACATGGTTTACATGT 
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5 TCCAATATGATTCCACCCATGGCAAATTCCATGGCACCGTCAAGGCTGA 
GAACGGGAAGCTTGTCATCAATGGAAATCCCATCACCATCTTCCAGGAG 
CGAGATCCCTCCAAAATCAAGTGGGGCGATGCTGGCGCTGAGTACGTCG 
TGGAGTCCACTGGCGTCTTCACCACCATGGAGAAGGCTGGGGCTCATTT 
GCAGGGGGGAGCCAAAAGGGTCATCATCTCTGCCCCCTCTGCTGATGCC 

1 0 CCC ATGTTCGTCATGGGTGTGAACCATGAGAAGTATGACAACAGCCTCA 
AGATCATCAGCAATGCCTCCTGCACCACCAACTGCTTAGCACCCCTGGCC 
AAGGTCATCCATGACAACTTTGGTATCGTGGAAGGACTCATGACCACAG 
TCCATGCCATCACTGCCACCCAGAAGACTGTGGATGGCCCCTCCGGGAA 
ACTGTGGCGTGATGGCCGCGGGGCTCTCCAGAACATCATCCCTGCCTCTA 

1 5 CTGGCGCTGCC AAGGCTGTGGGCAAGGTCATCCCTGAGCTGAACGGGAA 
GCTCACTGGCATGGCCTTCCGTGTCCCCACTGCCAACGTGTCAGTGGTGG 
ACCTGACCTGCCGTCTAGAAAAACCTGCCAAATATGATGACATCAAGAA 
GGTGGTGAAGCAGGCGTCGGAGGGCCCCCTCAAGGGCATCCTGGGCTAC 
ACTGAGCACCAGGTGGTCTCCTCTGACTTCAACAGCGACACCCACTCCTC 

20 CACCTTTGACGCTGGGGCTGGCATTGCCCTCAACGACCACTTTGTCAAGC 
TCATTTCCTGGTATGACAACGAATTTGGCTACAGCAACAGGGTGGTGGA 
CCTCATGGCCCACATGGCCTCCAAGGAGTAAGACCCCTGGACCACCAGC 
CCCAGCAAGAGCACAAGAGGAAGAGAGAGACCCTCACTGCTGGGGAGT 
CCCTGCCACACTCAGTCCCCCACCACACTGAATCTCCCCTCCTCACAGTT 

25 GCCATGTAGACCCCTTGAAGAGGGGAGGGGCCTAGGGAGCCGCACCTTG 
TCATGTACCATC AATAAA GTACCCTGTGCTCAACC (SEQIDNO: 19) 
The expected end of the 3' fragment is: 

5'-AAGTGTTGCAAGGCTGCCGACAAGGATAAC-3' (SEQ ID NO: 20) 
3'-CAACGTTCCGACGGCTGTTCCTATTG-5' (SEQ ID NO: 34) 

30 The cDNA derived in the SDGI procedure has an extended polyA tail of a 

specific sequence. This is underlined in the sequence below which describes the 
exact structure of the double stranded structure of the 3' most fragment of the human 
GAPDH cDNA. Note the overhang structure of the 5' end. The Bbvl recognition 
sequence is underlined. 

3 5 GCCTCTACTGGCGCTGGATGACCGCGACCC AAGGCTGTGGGC A AGGTC A 
TCCCTGAGCTGAACGGGAAGCTCACTGGCATGGCCTTCCGTGTCCCCACG 
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5 GTTCCGACACCCGTTCCAGTAGGGACTCGACTTGCCCTTCGAGTGACCGT 
ACCGGAAGGCACAGGGGTGTGCCAACGTGTCAGTGGTGGACCTGACCTG 
CCGTCTAGAAAAACCTGCCAAATATGATGACATCAAGAAGACGGTTGCA 
CAGTCACCACCTGGACTGGACGGCAGATCTTTTTGGACGGTTTATACTAC 
TGTAGTTCTTCGTGGTGAAGCAGGCGTCGGAGGGCCCCCTCAAGGGCAT 

1 0 CCTGGGCTACACTGAGCACCAGGTGGTCTCCTCACCACTTCGTCCGCAGC 
CTCCCGGGGGAGTTCCCGTAGGACCCGATGTGACTCGTGGTCCACCAGA 
GGACTGACTTCAACAGCGACACCCACTCCTCCACCTTTGACGCTGGGGCT 
GGCATTGCCCTCAACGACCACTTGACTGAAGTTGTCGCTGTGGGTGAGG 
AGGTGGAAACTGCGACCCCGACCGTAACGGGAGTTGCTGGTGAATGTCA 

1 5 AGCTCATTTCCTGGTATGACAACGAATTTGGC TACAGCAACAGGGTGGT 

GGACCTCATGGCCCACACAGTTCGAGTAAAGGACCATACTGTTGCTTAA 
ACC GATGTCGTTGTCCCACCACCT GGAGTACCGGGTGATGGCCTCCAAG 

GAGTAAGACCCCTGGACCACCAGCCCCAGCAAGAGCACAAGAGGAAGA 
GAGAGACCCTTACCGGAGGTTCCTCATTCTGGGGACCTGGTGGTCGGGG 
20 TCGTTCTCGTGTTCTCCTTCTCTCTCTGGGACACTGCTGGGGAGTCCCTGC 
CACACTCAGTCCCCCACCACACTGAATCTCCCCTCCTCACAGTTGCCATG 
GTGACGACCCCTCAGGGACGGTGTGAGTCAGGGGGTGGTGTGACTTAGA 
GGGGAGGAGTGTCAACGGTACTAGACCCCTTGAAGAGGGGAGGGGCCT 
AGGGAGCCGCACCTTGTCATGTACCATC AATAAA GTACCCTGTATCTGG 

t 

25 GGAACTTCTCCCCTCCCCGGATCCCTCGGCGTGGAACAGTACATGGTAGT 
TATTTCATGGGACAGCTCAACCAAAAAAAAAAAAAAAAAACGAGTTCCT 
TTTTTTTTTTTTTTTTT (SEQ ID NO: 21) 
The specific adaptor that will ligate to this overhang is: 
GGTACGACGTTCAGCAGCCTCTACTGGCGCTG (SEQ ID NO: 35) 

30 Ada AGG 3' CCAATAGGCAGCCGCCGCTGCCATGCTGCAAGTCGANGGAGA': 
ACCGCGAC (SEQ ID NO: 22) 

Note the mismatch in the upper sequence (helper) of the adaptor, marked by 
an underline. To the right of the adaptor, the end of the human GAPDH sequence is 
shown to emphasize the match between the adaptor and the overhang. Ligation 

35 specificity is examined by the ability of the "TAIL" primer that matches the 3' 

(specific) part of the adaptor (Tail AGG 5' GGTTATCCGTCGGCGGCGAC 3') 
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5 (SEQ ID NO: 23) to amplify, in combination with a GAPDH-specific reverse primer 
(underlined above - 5' TACAGCAACAGGGTGGTGGA 3') (SEQ ID NO:24). 
This PGR amplification should result in a fragment of a specific size, 390bp in the 
example of GAPDH (350 + 40 of the adaptor). Complete specificity is achieved 
when all of the other TAIL primers are unable to amplify the GAPDH sequence. 
1 0 This is what is shown in Figure 5 where all 64 TAIL plus GAPDH reverse PGR 
amplifications were performed and only the Ada-AGG TAIL gave the expected 
fragment of 390bp. 

Example 4 
Ligation Efficiency Analysis 

15 To examine ligation efficiency, the successful amplification of a set of rare 

mRNAs was tested. As above, reverse primers for the specific genes are used in 

combination with the specific primers that were expected to ligate to the ends of 

these cDNAs. All reaction conditions were performed as described above. 

Example 5 

20 Amplification-division of the Different Groups ( general-specific PCR^ 

The ligation could employ a mix of all 64 (or 256) adaptors. While the 

following details the protocol performed on the set of 64 adaptors, the same protocol 

applies to a set of 256 adaptors. To divide the ligated cDNA into 64 groups, 64 PGR 

reactions were performed. Each reaction used a primer specific for one of the 
25 specific ligation adaptors, and the general primer. This resulted in a specific 

amplification of all cDNAs ligated with the specific adaptor. 

PGR conditions were: 2 min. at 95'^C; followed by 30 cycles of 1 min. at 

95''C; 1 min. at 58°C; and 2 min. at 68''C, followed by incubation for 7 min. at 

68°C. Figure 6 shows the products of the 64 specific-general-primed PGR reaction. 
30 Southern blot analysis of the 64 reactions (Figure 7) demonstrates the specificity of 

the procedure- After amplification with specific and general primers, GAPDH 

mRNA was amplified only in the expected group (AGG). 

The PGR products were column purified (QIAquick spin) to remove the 

unincorporated primers and nucleotides. In this step, an mRNA source of 1 0,000 
35 genes was divided into 64 groups each containing an average of 150 cDNA species 

(genes). If the source contains all 100,000 human genes, each group will contain an 

average of 1500 cDNA species (genes). 
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5 Example 6 

Priming Nesting Procedure 

First nesting 

In this step, each of the 64 groups from the previous step was further sorted 
into 256 subgroups. Division into fewer groups is also possible. A set of nesting 
1 0 oligonucleotides, the 1st nesting set, was used. This set of 256 nesting primers 
could be used on all 64 general-specific primed groups (as well as on the 256 
general-specific primed groups) since they prime from the "constant" region of the 
specific adaptor. The overall structure of a 1st nesting primer is: 5'- 
GCGGCCGCGGTACGACGTACCTGCNniWXYZ-3^ (SEQ ID NO: 25) where 
15 I=inosine; N=any nucleotide; each of W, X, Y, Z^C, G, T or A. 

The NIII nucleotides match the four nucleotides in the specific adaptor used 
to ligate to the overhang end of the cDNAs. The inosine nucleotides can match any 
of the regular nucleotides. The WXYZ nucleotides, covering all 256 possibilities of 
C, G, T or A allow nesting into the four nucleotides adjacent to the overhang. The 
20 first nesting oligonucleotide list is shown in Figures 14 and 15. 

Figure 14 shows first nesting primers 256 for tail adaptor 64 set 1, 
represented by the formula: 5' GGTACGACGTTCAGCTNIUWXYZ (SEQ ID 
NO: 26) wherein W, X, Y and Z can be any of A, T, C or G. 

Figure 1 5 shows first nesting primers 64 for tail adaptor 64 set 1 , represented 
25 by the formula: 5' GGTACGACGTTCAGCTNIIIXYZ (SEQ ID NO: 27) wherein 
Xp Y and Z can be any of A, T^ C or G. 

An optional X exonuclease reaction can be performed to eliminate carry-over 
of cDNA from the original cDNA reaction. This is because the oligo(dT) primer 
used to produce the cDNA in the reverse transcription reaction is phosphorylated 
30 and the general and specific primers used for general-specific primed amplifications 
are not phosphorylated. The following mixture was prepared: 2^x1 of purified 
general-specific primed PGR product; 6}il H2O, 1 \xl X exonuclease buffer; and 1 p.1 A- 
exonuclease. The reaction mixture was then column purified. 

For nesting, a 1 :500 dilution of the general-specific PGR product was taken, 
35 PGR reaction constituents were standard (including anti-Taq antibody). Cycling 
conditions were: 1 min. at 95°C; 1 min. at 59'^C; and 2 min, at 70°C. 30 cycles 
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5 were performed. After PGR, the unincorporated primers and nucleotides were 

removed using QIAquick spin columns. 

The 1st nesting stage divides each of the 64 groups into 256 groups for a 

total of 16,384 groups. Thus, for an mRNA source of 10,000 genes, each of the 256 

1 st nesting tubes should contain an average of less than 1 cDNA species (gene). 
10 This means that most tubes (>100) will contain one cDNA species, some will be 

empty and a few will contain more then one cDNA species. Figure 8 shows the 

results of a 1st nesting PGR done on 3 of the 64 groups. The object of the nesting 

PGR was to isolate three specific genes according to the sequences around the Bbvl 

site closest to the 3' end. 
15 For a source containing all 100,000 human genes, each of the 256 tubes will 

contain an average of 6 cDNA species (genes). Thus, a further nesting round would 

achieve one gene only per well. 

Second nesting 

In this stage, each of the 256 1st nesting groups was further divided into 16 
20 groups. As for the 1 st nesting primers, this set of 1 6 2nd nesting primers can be 

used on all 1st nesting primer reactions, since they prime from the ''constant" region 
of the ligation adaptors. 

The primers used for the 2nd nesting are of the structure: 
5'GCGGGCGGGGTAGGACGTACCTGCNGGGIIIINNXY3' (SEQ ID NO: 28) 
25 where I=inosine; N=any nucleotide; each of W, X, Y and Z can be any of G, G, T or 
A. In places were inosine was present in the 1 st nesting primer a "G'' is placed in 
the 2nd nesting primers (since a "G'' is incorporated as a match to 'T'). Lists 
detailing second nesting primers are shown in Figures 16 and 17. 

Figure 16 shows second nesting primers 64 for tail adaptor 64 set 1, 
30 represented by the general formula: 5' GGTACGAGGTTGAGCTNGGG 

IIIXYZ (SEQ ID NO : 29) wherein each of X, Y and Z can be any of A, T, G or G. 

Figure 1 7 shows second nesting primers 1 6 for tail adaptor 64 set 1 , 
represented by the general formula: 5' GGTAGGAGGTTGAGGTNGGG 
IIIXY (SEQ ID NO: 30) wherein each of X and Y can be any of A, T. C or G. 
35 A 1 :500 proportion of the 1st nesting purified PGR products are used for a 

PGR reaction perfoniied with exactly the same constituents and conditions as 
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5 described above. Figure 9 shows the results of 2nd nesting PGR on the three groups 
shown in Figure 8. Highly pure DNA fragments were obtained. 

The 2nd nesting stage divides each of the 16,384 Ist nesting groups into 
sixteenths, for a total of 262,144 groups. Thus, about 100,000 of the groups should 
contain cDNA products and more than 95% of them should contain only one gene or 
1 0 gene fragment. 

Example 7 
Ligation Nesting Procedure 

Digestion of END-TAIL PGR product 

The coliamn purified PGR products of each of the 64 general-specific primed 

1 5 groups described above were digested with BspMl under standard manufacturer 
(New England BioLabs) conditions. The released adaptors were removed by 
column purification (QIAquick spin). 
First nesting ligation (Adaptor set #2) 

1 OOng of each of the 64 digested products were ligated with a mix of 64 

20 adaptors (nesting ligation adaptor set). Ligation conditions were identical to those 
detailed above with differential ligation. The same specificity and efficiency tests, 
detailed above, were successfully performed. Each ligation was colunm pxirified 
(QIAquick spin) to remove unligated adaptors. DNA was eluted from the column in 
a final volume of lOOjal. Adaptor set #2 is shown in Figure 20. The tail adaptors 64 

25 (set number 2) in Figure 20 can be represented by the general formula: 

[specificl constant] specific 1 

5' Ph-XYZN GGAGGTAGGTGGTAGG GCGGCCGC 

GTGAGCTTGAGTCGCGTGGA (SEQ ID NO: 3 1) wherein X, Y and Z can be any 

of A, T, Cor G. 
30 Amplification of the first ligation products 

Each of the 64 ligations was then divided into 64 tubes. The final number of 

tubes was thus 4096. From each ligation tube l|al was taken for each of the 64 

amplifications. Each amplification was done by one of the 64 specific primers and 

the general primer. Amplification conditions were identical to those detailed in the 
35 '^Amplification-division of the different groups (general-specific primed PGR)'' 

described above. Each PGR reaction mixture was column purified (QIAquick spin) 

to remove unincorporated primers and nucleotides. 
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5 Digestion of END-TAII first ligation PGR product 

The procedure detailed in the ''Digestion of general-specific primed PGR 
product" above is repeated. 
Second nesting ligation 

The procedure detailed for the 1 st nesting ligation is repeated with adaptor 
10 set #L Figure 19. The tail adaptors 64 (set number 1) of Figure 19 are represented 
by the formula: 

I constant 1 specific 1 

AdaXYZ 5ThTTTNAGGTGAAGGTCGTAGGCGTGGAAGGAAGAGGGGGGT 

(SEQ ID NO: 32) wherein each of X, Y and Z can be any of A, T, C or G. 

1 5 Example 8 

Gene Analysis (Agarose Gel and Sequencing) 

PGR products obtained from the 2nd nesting reaction (either priming or 

ligation) are separated an agarose gel to examine the presence of PGR products and 

the number of fragments (Figure 9). Sorted or isolated cDNAs are purified and 

20 sequenced using the constant region of the ligation adaptors as a primer. 

Example 9 

Construction of cDNA Library from the Amplification 
Products Obtained from the Differential Ligation Step 

The double stranded cDNA, prepared as described above, was divided into 

25 two pools. One pool was digested with Bbsl and the second with Bsah The 

following procedure was done in parallel for each pool. 

Following differential ligation^, performed as described above, using adaptor 

set 32 for ligation, PGR amplification was performed as with primer set J2- 

Amplified products were column purified (QIAquick spin). The PGR products from 

30 each of the 64 groups were digested with Notl and Ascl and were column purified 

(QIAquick spin), A plasmid that contains Notl and Ascl in its multiple cloning site 

is digested with Notl and Ascl and the linearized fragment is purified. . The purified 

Notl-Ascl digested products are then ligated to a linearized plasmid. 

Ligation products were transformed into bacteria using standard protocols. 

35 Transformed bacteria were plated onto growth plates and, following standard 

incubation, hundreds to thousands of colonies grow on each plate. For sequencing, 

each plasmid was purified from picked colonies and prepared for sequencing using 
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5 standard protocols. Another option is to amplify the insert from the plasmid found 
in the picked colonies using primers flanking the insert. The amplified inserts are 
sequenced using standard protocols. 

The double stranded cDNA, prepared as described above, is divided into 
between 3 and 25 pools for digestion with restriction enzymes, ligation and 
10 expression. Expression of the separated genes can be in a bacterium. 

Having thus described in detail preferred embodiments of the present 
invention, it is to be understood that the invention defined by the appended claims is 
not to be limited by particular details set forth in the above description as many 
apparent variations thereof are possible without departing from the spirit or scope 
15 thereof. 
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CLAIMS 

1 . A method of sorting genes comprising: 

(1) preparing ds cDNA molecules from roRNA molecules by reverse 
transcription, using a poly-T primer optionally having a general primer- 
template sequence upstream from the poly-T sequence, yielding ds cDNA 
molecules having the poly-T sequence, optionally having the general primer- 
template sequence; 

(2) digesting the ds cDNA molecules with a restriction enzyme that 
produces digested cDNA molecules with cohesive ends having overhanging 
ssDNA sequences of a constant number of arbitrary nucleotides; 

(3) ligating to the digested cDNA molecules a set of dsDNA 
oligonucleotide adaptors, each of which adaptor has at one of its ends a 
cohesive-end ssDNA adaptor sequence complementary to one of the possible 
overhanging ssDNA sequences of the digested cDNA, at the opposite end a 
specific primer-template sequence specific for the ssDNA adaptor 
complementary sequence, and in between the ends a constant sequence that . 
is the same for all of the different adaptors of the set; 

(4) amplifying by separate PCRs the ligated cDNA molecules, utilizing 
for each separate PGR a primer that anneals to the cDNA poly-T sequence 
optionally having the cDNA general primer-template, and a primer from a 
set of different specific primers that anneal to the cDNA specific primer- 
template sequences; and 

(5) sorting the amplified cDNA molecules into non-redundant groups by 
collecting the amplification products after each separate PGR, each group of 
amplified cDNA molecules determined by the specific primer that annealed 
to the specific primer-template sequence and primed the PGR. 

2. The method according to claim 1, wherein the restriction enzyme is 
selected from type lis restriction enzymes. 

3. The method according to claim 2, wherein the type lis restriction 
enzyme is Bbvl, BspML, Fokl, Hgal, Mbpl, Bbsl, Bsal, NspMl, BsmBI or SfaNL. 
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4. The method according to claim 1 , wherein the restriction enzyme is 
selected from type n restriction enzymes. 

5. The method according to claim 4, wherein the from type n restriction 
enzyme is Bgll, BstXL or Sfil. 

6. The method according to claim 1, wherein a complete set of 
oligonucleotide adaptors and specific primers contains aa oligonucleotide adaptor and 
a specific primer complementary to each of the possible overhanging ssDNA 
sequences of the digested cDNA. 

7. The method according to claim 1 wherein the 3 -most nucleotide of the 
ssDNA complementary sequence of the oligonucleotide adaptor is an arbitrary 
nucleotide N, which pairs with the 5 '-most nucleotide of each of the possible 
overhanging ssDNA sequences of the digested cDNA. 

8. The method according to claim 7, comprising using a complete set of 
oligonucleotide adaptors and specific primers, containing an oligonucleotide adaptor 
and a specific primer complementary to each of the possible overhanging ssDNA 
sequences of the digested cDNA excluding the 5 '-most nucleotide that pairs with the 
arbitrary nucleotide N of the oligonucleotide adaptor. 

9. The method according to claim 8, wherein a complete set of 
oUgonucleotide adaptors have 4, 16, 64, 256, or 1024 oligonucleotide adaptors; 
wherein the constant number of arbitrary nucleotides is 1, 2, 3, 4, or 5. 

10. The method according to claim 1 further comprising: 

(1) amplifying the sorted non-redundant groups of cDNA molecules by 
nesting PGR, each amplification utilizing a primer that anneals to the cDNA 
poly-T sequence optionally haying the cDNA general primer-template 
sequence, as well as one of a set of nesting primers with the following general 
formula: 

5 -[sequence complementary to the constant sequence of the 
oligonucleotide adaptors|-NIx-|l-5 nucleotides 
complementary to one of the possible sequences of 1-5 
nucleotides immediately upstream from the overhanging 
ssDNA sequence on the cDNA|-3' where N is an arbitrary 
nucleotide; I is inosine; and x=l,2,3 or 4, being one fewer 
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than the constant number of nucleotides in the overhanging 

ssDNA sequences; and 
(2) sorting the amplified cDNA molecules into non-redundant subgroups 
by collecting the amplification products after each separate nesting PGR, each 
non-redundant subgroup of cDNA molecules determined by the particular 
nested primer that complemented the 1-5 nucleotides irmnediately upstream 
from the overhanging ssDNA sequence on the cDNA. 

11. The method according to claim 10 comprising using a complete set of 
nesting primers, containing a nesting primer complementary to each of the possible 
sequences of 1-5 nucleotides immediately upstream from the overhanging ssDNA 
sequence on the cDNA. 

12. The method according to claim 10, comprising conducting further 
PCRs with further nesting primers complementary to the next immediately upstream 
cDNA nucleotides, thereby sorting the amplified cDNA molecules further into non- 
redundant subgroups. 

13. The method according to claim 12, further comprising repeating the 
steps according to claim 10 until each non-redundant subgroup contains only one 
type of cDNA molecule, with every expressed-gene transcript in the mRNA sample 
uniquely represented in one of the non-redundant subgroups. 

14. A method of sorting genes comprising: 

(1) preparing ds cDNA molecules from mRNA molecules by reverse 
transcription, using a poly-T primer optionally having a general primer- 
template sequence upstream from the poly-T sequence, yielding ds cDNA 
molecules having the poly-T sequence, optionally having the general primer- 
template sequence; 

(2) digesting the ds cDNA molecules with a first restriction enzyme that 
produces digested cDNA molecules with cohesive ends having first 
overhanging ssDNA sequences of a constant number of arbitrary nucleotides; 

(3) ligating to the digested cDNA molecules a set of dsDNA 
oligonucleotide adaptors, each of which adaptor has at one of its ends a 
cohesive-end ssDNA adaptor sequence complementary to one of the possible 
first overhanging ssDNA sequences of the digested cDNA, at the opposite 
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end a specific primer-template sequence specific for the ssDNA adaptor 
complementary sequence, and in between the ends a constant sequence that 
is the same for all of the different adaptors of the set, and that contains a 
recognition site for a second restriction enzyme that can cleave the ligated 
cDNA molecules at a point further from the ligated oligonucleotide adaptor 
than the overhanging ssDNA sequences of the digested cDNA, and can 
create cohesive ends having second overhanging ssDNA sequences of a 
constant number of arbitrary nucleotides; 

(4) amplifying by separate PCRs the ligated cDNA molecules, 
utilizing for each separate PGR a primer that anneals to the cDNA poly-T 
sequence optionally having the cDNA general primer-template, and a primer 
from a set of different specific primers that anneal to the cDNA specific 
primer-template sequences; and 

(5) sorting the amplified cDNA molecules into non-redundant 
groups by collecting the amplification products after each separate PGR, 
each group of amplified cDNA molecules determined by the specific primer 
that annealed to the specific primer-template sequence and primed the PGR. 

15. The method of claim 14 wherein the first restriction enzyme is selected 
from type n and type lis restriction enzymes 

16. The method according to claim 14 wherein the second restriction 
enzyme is selected from type Us restriction enzymes. 

17. The method according to claim 14 comprising using a complete set of 
oligonucleotide adaptors and specific primers, containing an oligonucleotide adaptor 
and a specific primer complementary to each of the possible first overhanging ssDNA 
sequences of the digested cDNA. 

18. The method according to claim 14 wherein the 3 -most nucleotide of 
the ssDNA complementary sequence of the oligonucleotide adaptor is an arbitrary 
nucleotide N, which pairs with the 5 '-most nucleotide of each of the possible first 
overhanging ssDNA sequences of the digested cDNA. 

19. The method according to claim 18 comprising using a complete set of 
oligonucleotide adaptors and specific primers, containing an oligonucleotide adaptor 
and a specific primer complementary to each of the possible first overhanging 
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ssDNA sequences of the digested cDNA excluding the 5 '-most nucleotide that pairs 
with the arbitrary nucleotide N of the oligonucleotide adaptor. 

20. The method according to claim 14 further comprising: 

(1) digesting the sorted non-redundant groups of cDNA molecules with 
the second restriction enzyme, cleaving the ligated cDNA molecules at a 
point further from the ligated oligonucleotide adaptor than the overhanging 
ssDNA sequences of the digested cDNA, and creating cohesive ends having 
second overhanging ssDNA sequences of a constant number of arbitrary 
nucleotides; 

(2) ligating to the digested cDNA molecules a set of nesting dsDNA 
oligonucleotide adaptors, each of which adaptor has at one of its ends a 
cohesive-end ssDNA adaptor sequence complementary to one of the possible 
second overhanging ssDNA sequences of the digested cDNA, at the opposite 
end a specific primer-template sequence unique for the ssDNA adaptor 
complementary sequence, and in between the ends a constant sequence that 
is the same for all of the different adaptors of the set, and that contains the 
recognition site for the second restriction enzyme; 

(3) amplifying by separate PCRs the ligated cDNA molecules, 
utilizing for each separate PGR a primer that anneals to the cDNA poly-T 
sequence optionally having the cDNA general primer-template, and a primer 
from a set of different specific primers that anneal to the cDNA specific 
primer-template sequences; and 

(4) sorting the ampHfied cDNA molecules into non-redundant 
subgroups by collecting the amplification products after each separate PGR, 
each subgroup of amphfied cDNA molecules determined by the specific 
primer that annealed to the specific primer-template sequence and primed the 
PGR. 

21. The method according to claim 20 comprising using a complete set of 
nesting dsDNA oligonucleotide adaptors, containing an oligonucleotide adaptor 
complementary to each of the possible second overhanging ssDNA sequences of the 
digested cDNA. 
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22. The method according to claim 20, further comprising conducting 
further PCRs using further nesting oligonucleotide adaptors, optionally with different 
restriction enzymes and recognition sites, thereby sorting the amplified cDNA 
molecules fuither into non-redundant subgroups. 

23. The method according to claim 22, further comprising repeating the 
steps according to claim 20 until each non-redundant subgroup contains only one type 
of cDNA molecule, with every expressed gene in the mRNA sample uniquely 
represented in one of the non-redundant subgroups. 

24. A method of sorting genes and/or gene fragments comprising the steps 

of: 

(1) preparing ds cDNA molecules from mRNA molecules by reverse 
transcription, using a poly-T primer having a general primer-template 
sequence upstream from the poly-T sequence that includes a recognition 
sequence for a restriction enzyme, yielding ds cDNA molecules having the 
poly-T sequence, having the general primer-template sequence; 

(2) dividing the cDNA into N pools, wherein N is 1 to 25, by digesting 
the ds cDNA molecules with different restriction enzymes that produce 
digested cDNA molecules with cohesive ends having overhanging ssDNA 
sequences of a constant number of arbitrary nucleotides; 

(3) ligating to the digested cDNA molecules of each pool a set of dsDNA 
oligonucleotide adaptors, each of which adaptor has at one of its ends a 
cohesive-end ssDNA adaptor sequence complementary to one of the possible 
overhanging ssDNA sequences of the digested cDNA, at the opposite end a 
specific primer-template sequence specific for the ssDNA adaptor 
complementary sequence, and in between the ends a constant sequence that 
is the same for all of the different adaptors of the set; 

(4) amplifying by separate PCRs the ligated cDNA molecules of each 
pool, utilizing for each separate PGR a primer that anneals to the cDNA 
poly-T sequence optionally having the cDNA general primer-template, and a 
primer from a set of different specific primers that anneal to the cDNA 
specific primer-template sequences; 
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(5) sorting the amplified cDNA molecules from each pool into non- 
redundant groups by collecting the amplification products after each separate 
PGR, each group of amplified cDNA molecules determined by the specific 
primer that annealed to the specific primer-template sequence and primed the 
PGR, wherein each of the restriction enzymes digests the N separate cDNA 
pools into 64 or 256 non-redundant sub-groups; and 

(6) digesting cDNA fragments in each non-redundant sub-group of the 
cDNA pools with different restriction enzymes, 

25. The method according to claim 24 further comprising purifying the 
digested cDNA fragments by removing the small end fragments produced by the 
digestion. 

26. The method according to claim 25 further comprising ligating the 
digested cDNA fragments into a plasmid vector that has recognition sequence for a 
restriction enzyme and is predigested with the enzyme, producing a set of ligations. 

27. The method according to claim 26, wherein the restriction enzyme is 
NotI or Ascl. 

28. The method according to claim 25 further comprising ligating the 
digested cDNA fragments into a genetic vector. 

29. The method according to claim 28, wherein the genetic vector is a 
viral vector, a bacterial vector, a protozoan vector, a retrotransposon, a transposon, a 
DNA vector, or a recombinant vector. 

30. The method according to 26 further comprising transforming the 
ligation products into bacteria and growing the bacteria under suitable conditions. 

31. The method according to claim 30, wherein the bacteria are grown on 
bacteria growth plates. 

32. The method according to claim 24, wherein N is two and the 
restriction enzymes of step (2) are Bbsl for one pool and Bsal for the second pool. 

33. The method according to claim 24, wherein N is two and the 
restriction enzyme in step (1) comprises Ascl or another similar rare restriction 
enzyme. 

34. The method according to claim 24, wherein N is two and the 
restriction enzyme in step (5) comprises Bbsl or Bsal, 
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35. The method according to claim 24, wherein N is two and the 
restriction enzyme in step (6) comprises NotI or Ascl. 

36. A method of making sub-libraries of ligation sets by ligating 
restriction enzyme digested fragments according to claim 24 into a plasmid vector 
that have recognition sequence for said restriction enzymes and predigesting with 
these enzymes to make 64xN or 256xN sets of ligations, wherein N is 1 to 25. 

37. A method of making sub-libraries of bacterial colonies, wherein the 
set of ligations according to claim 26 are transformed into an expression system to 
produce colonies of the expression system containing each of the 64xN or 256xN 
non-redundant subgroups of cDNA fragments, wherein N is 1 to 25, 

38. The method according to claim 37, wherein the expression system is 
a bacterium, 

39. The method according to claim 38, wherein the bacteria are placed in 
a suitable growth media. 

40. The method according to claim 39, wherein the growth media is 
bacterial growth plates. 
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5 ' CAACAACGTGCCGTTCGATAG 
5 ' GTACCACCCGAACGGTCGTAG 
GCACAACCGTCGACCGTACGA 
AAGCCGAGACGAGGTCTAACG 
ATCGCTGCGATCGGACGTTAG 
ACCGCAGACGTTCCGATACCG 
TCTACGTACGACGGTTCGGTA 
TACACCACGTGAATCCGCTAG 
ATCCTGGACAGAGTCGTCGAC 
TCGTGAGTCAAGAACCGTCGA 
5 ' ACAGCACACGTGATCcTTACG 
5 ' TGGTACACGCTCGATCCGTAAG 
CTCACTCGGGTCGTTGCGTATG 
gGaTTACACACGCAAgGATACG 
TGGCATCGTGCTTCTTCCGAT 
GACGTCCTCGCGAGAAATCGG 
AGTATCCAGCAGTGGGATGCG 
ACGAAGAGCGACCGAACCGTA 
GCAACTGCGGTTCGACGAATG 
ACGTTCGCGAGTCGAAATTCG 
AACGTGTCACTGCGTCGCGTT 
GTCTAGACGGAGAAGCAAAGC 
CGTTAGCGCTCGACGTTACGT 
GATCACTCCGCACGTCACGTA 
ACTAGTTACCGAGCGTCTACG 
CTATGCGAGAGACGCTCGTAG 
ACACGAACGGATGCGTTTCGC 
TACTAGCAGCAACGAAGCGAA 
CTAGACTCCGGTGTCGATCGT 
CGACTACGTCCCGACAACGAT 
AACTCGGAAGACGATGGTCGT 
AAGTATGGACGCATCGACGAC 
TGAAGGTCGACACGTTCGGTT 
AATACCGCGCAAACGTAACCA 
TAGACACAGGACCAGGGTTCG 
5 ' AGTACTTCGTGACGAGCGAAC: 
5 ' AACTAGAAGCTGCGGTTTGCG 
5 ' ACTAGCTGCGAACGGTCGCAA 
5 ' AGCATACGCTTACCTGCGACT 
5 ' ACGTGGAGCCTACGATAGTCG 
5 ' CCTAACCTCGAATCGCTCGAT 
5 ' ACCACGGCGCTACGGTATCGA 
5 ' ATGCCGTCGAGAGAGTTCGGT 
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tailTATA 

tailTAGT 

tailTAGG 

tailTAGC 

tai ITAGA 

tailTACT 

tailTACG 

tailTACC 

tailTACA 

tailTAAT 

tailTAAG 

tailTAAC 

tailTAAA 

tailGTTT 

tailGTTG 

tailGTTC 

tailGTTA 

tailGTGT 

tailGTGG 

tailGTGC 

tailGTGA 

tailGTCT 

tailGTCG 

tailGTCC 

tailGTCA 

tailGTAT 

tailGTAG 

tailGTAC 

tailGTAA 

tailGGTT 

tailGGTG 

tailGGTC 

tailGGTA 

tailGGGT 

tai IGGGG 

tai IGGGG 

tailGGGA 

tailGGCT 

tailGGCG 

tailGGCC 

tailGGCA 

tailGGAT 

tailGGAG 

tailGGAC 

tailGGAA 

tailGCTT 

tailGCTG 

tailGCTC 

tailGCTA 

tailGCGT 

tailGCGG 

tailGCGC 

tailGCGA 

tailGCCT 



5 ' TCAACCACGAGTGACGATCGA 
5 ' ACTATCCTCGTCGTCAGTCGC 
5 ' AGGTTATCCGTCTGCCACGAC 
5 ' TCCACGACTGACGAACCGCAT 
5 ' GAGCTAGACGGAATCGATACG 
5 ' AACGGAGCCGTCGATCTTCGT 
5 ' CAGTACGTGGTCTTCGTTCGA 
5 ' CGATCACCGCCGAAGTCAGCA 
5 ' GTCAGACTCGCGTCTACGAAC 
5 ' CCATTCGAGTAAACGCGATTG 
5 ' ATAGTCGCTCGTTCCGAATCG 
5 ' GCCTTAGAGCCAGGAAGAACG 
5 ' GGTTCACGCACGTTAGCGTTC 
5 ' CCAATTCCTTCCCTGGCTCATC 
5 ' TCTCGGTCGCCTCGTCTAATC 
5 ' AGACTCCTCAGCTGACCTAGTC 
5 ' AGTC AGCTCGCCACTCGTAGT 
5 ' AGAGTACTCGAGTCAGTAGGC 
5 ' ACAGAGGAGTCGGGAACAACG 
5 ' CTTGGGTACCTGTGTCCGTTG 
5 ' ACAGTACGAAGCAATCTGTGA 
5 ' AGCTCGGAGAGCATAAGGACG 
5 ' TCTCGGGCATTACTGGATAGG 
5 ' CCTTAACCtGATCTGTCcCATG 
5 ' GTGCGAGTCCAGTTTGACTGA 
5 ' GGTGGCCAACCAC AGCCTTC 
5 ' TGAGATGAGGTGTACGACTGC 
5 ' TGTCAATGCGCCAGTTGTCTA 
5 ' GCACCAACACCTAGTGGCATC 
5 ' GATCTGTAGAGCGGGAGGTCT 
5 ' TGGCTAAGGGTGCTGCCACGC 
5 ' ATGAGACTCCAGCCGAAACCT 
5 ' AGTGTAGGGACGACCTGCAGA 
5 ' GGCAACGGCATAGCTGATACA 
5 ' GATGCTGAGGTATGAGGCAACG 
5 ' ACGTCATTTGGCCTGTCTGCT 
5 ' GACTCACGTGCTCGAACTGCT 
5 ' AGTCGGCaTGTgGCAc AtcTc 
5 ' ACTCGGTAGACAGCCGCTAAC 
5 ' CTGGGACACGGTCACTATTCAC 
5 ' ACCCTTGGAACGCTGTACACA 
5 ' TCCGGACACGTAGTGAGACGT 
5 ' TGCCTTGCACTCTTACCTAGC 
5 ' TAGCCAGTATCGTGCACTTGG 
5 ' AAGCTTACCACCCTACACGAA 
5 ' AgGATGaTGACaTGGgTCGAa 
5 ' AACCTCCATGACAAGTCCTCC 
5 ' AACACCGTGGGACAGACATCT 
5 ' CCACGGAACATACAGGGCATT 
5 ' CATGAGCGTGGAGCTAAGCAT 
5 ' CATCTGTCAC AAGGTACGAGG 
5 ' AgGaGATgGAaCGCTCGc ACA 
5 ' TCTGTGTCCTCGACCAGCATC 
5 ' AACTCCAGGTGGAAGCTGGTT 
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t a i IGCCG 5 ' CAGACTCACATCGAACGTCAC 
tai IGCCC 5 ' TGTAACTCCGAATGGGACACC 
t ai IGCCA 5 ' GTTGATGCTCTCCCTC ACCTG 
tai IGCAT 5 ' GAGTCTGCCAACAAGGTCGAG 
tai IGCAG 5 ' GTTGTGAGGAACCGCAATGCA 
tai IGCAC 5 ' ACCTCAGTGAACAGCTCTCAG 
tailGCAA 5 ' GATCCAGGTCGCTATCCACTG 
tailGATT 5 ' CcACATGCGaTCTCAAaTCCa 
tai IGATG 5 ' TTGTCGTGACGACCTAGACGC 
tailGATC 5 ' TTGAGGCGTCTAATCATCGGG 
tailGATA 5 ' CGCTCAGCAATCGCCACTATC 
tai IGAGT 5 ' CATTATCACACATGAGCCGCC 
tailGAGG 5 ' GAGGGGCAAGAGAAAA.CCACC 
tai IGAGC 5 ' AAGTCCAGCGAGCTGTCTTCC 
tailGAGA 5 ' AgGCCgctTCTCAGtAAGGTC 
tai IGACT 5 ' GTGTACGCAGAGAACCCCACA 
tailGACG 5 ' GGTCTCCTGGACAACAGTTCC 
tailGACC 5 ' CaGttGCATCACtCtggCATC 
tai IGACA 5 ' AAGACCGAATCGCGAAATGAG 
tailGAAT 5 ' GTTCAGACCACCCGGTTCACA 
tai IGAAG 5 ' TGCTACAGCAGGATCCTCTGG 
tai IGAAC 5 ' GATACCTAGACCGGCAGCAAC 
tai IGAAA 5 ' CACTGAGAGCTAGGAAACCCAC 
tailCTTT 5 ' GGGATAAATCCTGATGCCGTC 
tai ICTTg 5 ' CAGTCTCAACCCTTGCCTGTC 
tailCTTc 5 ' TCACGGAGCTCACCTAAGCAC 
tailCTTa 5 ' GATTTGGAGCTGACCTGATGC 
tai ICTGT 5 ' GATGTATCTATGAAATCGAGT 
tai iCTGg 5 ' CAACCCCGTAACTCCGTTCAG 
tai iCTGc 5 ' CGTCGACTTGTGCGACCTTCG 
tai iCTGa 5 ' AACACGCACAACCAGGTCATG 
tailCTCT 5 ' TCGTCTCCAGCTACTGGACTC 
tai iCTCg 5 ' TACGCTCAACACTTACAGACG 
tailCTCc 5 ' GGGCAACAGCACCTACTATAC 
tailCTCa 5 ' CGTCTGACCAGTCTTCCACTC 
tai ICTAT 5 ' GGGAGAGGTGTTTTCCAGTCG 
tailCTAg 5 ' GACCCAAGTAGTCGTCGCGAA 
tailCTAc 5 ' CACCATGGTGAATCAGGCTCC 
tai iCTAa 5 ' ACCTGAGTGTGGGAAGGTCGA 
tailCGTT 5 ' TGCGAAACTGTCTGTCGGAAG 
tail CGTg 5 ' GCTTTGGC AATCCTC AAGCAG 
tai iCGTc 5 ' TCGCTCCTGACTCATCGAACA 
tai ICGTa, 5 ' CAGAGTCGGTACCATCTCGAC 
tai ICGGT 5 ' GCGGACAAAGGATATGTTGATC 
tai ICGGg 5 ' CACTAGGACCTTTTGTCGGAAG 
tai ICGGc 5 ' TAAGAGCGGTGCTAGCGTGAG 
tailCGGa 5 ' GGAGCCTCGAGATTCGTTGGT 
tailCGCT 5 ' GCCTGGTCTTTCAGCATGGAC 
tai ICGCg 5 ' CTTGTCAGCCGAACGTCTGTC 
tailCGCc 5 ' ACGCTGCAAGGCGGATAACAG 
tailCGCa 5 ' CAGCACATAGACAGGTGCCTCA 
tailCGAT 5 ' ATCATCACGTTGCACCAAGGG 
tai iCGAg 5 ' TCCAGAGGAACGTACGACCCT 
tailCGAc 5 ' GAACAGGAGACAGAGCGAGCA 



tai ICGAa 5 ' CTACGGTCAGTACGACGTGGA 
tai ICCTT 5 ' AAATTATTCGCTGGAGCGCTG 
tai iCCTg 5 ' CAGCTGCGGTGTAGCATACAG 
tai ICCTc 5 ' ACTCGTAATCGTTCCAGACGC 
tai ICCTa 5 ' ATACGTGTTATGGCCGGAAAG 
tai iCcgT 5 ' GCTCCGAAGTTAGGTTGGGAA 
tai ICcgg 5 ' GTTCACCCTTGCAACGATAGC 
t ai iCcgc 5 ' AGGGAGACTCCCTACTCGGAT 
tai ICcga 5 ' GAGTTGCCAGACATGTACCAG 
tailCcCt 5 ' GCCAGTTTCTTCCCACAAGCA 
tai ICcCg 5 ' GTGAACGAGTATGCGACCCAG 
tailCcCc 5 ' TTGCCTGTATTGCAACGCCTA 
tai ICcCa 5 ' TGAGCTGCTGGAAGATCAGGA 
ta i 1 Cc A t 5 ' AGTAGGGGAATACGCAACATG A 
tai iCcAg 5 ' GATCCACTTCGAGGAGTGACC 
tai ICcAc 5 ' GTACCACATTCGCTCGACACG 
tai ICcAa 5 ' CATTTCCCTCTCGAATTGGCA 
tai ICaTT 5 ' TCCGATGTATCGCCGAGATGT 
tai iCaTg 5 ' ACCAACTGAGAAGGAAGGTCA 
tailCaTc 5 ' CGAATCCTAGTCACCAGTACTC 
tai ICaTa 5 ' GGAAGGATGCACTCCTACCGA 
tai iCagT 5 ' AATAGCTCCCTCCCTCACCAC 
tai iCagg 5 ' GAGGACCATCTGCTACATCTC 
tai 1 Cage 5 ' ATTACTTCGCGGGTCCTAATC 
tai ICaga 5 ' CAGCGACAACAAAAGGCTATG 
tai ICaCT 5 ' GCGTTGACACCTCATCACTAG 
tailCaCg 5 ' TCTACCACTCACCGTCCGAAC 
tailCaCc 5 ' AGCATGCTTCTGAGGAAGTGC 
tai ICaCa 5 ' AGTCATCGTGGCTTGTGTTACA 
tai ICaAT 5 ' GACACTTGGCTATGGGTCCCA 
tai ICa Ag 5 ' CACAGTACGTGAGAGCTCC AA 
tai ICaAc 5 ' GAAGCAACCCAACAGGACCAG 
t ai ICaAa 5 ' AGAGACTCACCAGGAAGCAGC A 
tai 1 ATTT 5 ' TGTGGTACAGCAGAAGGCTGA 
t ai 1 ATTG 5 ' TCCAAGTTCGCCAAAGCAGGA 
tai lATTC 5 ' CGTGCGATTCTGGAATGCTTC 
t ai 1 ATTA 5 ' ACTCGGAATGGTGGGAGAGGA 
tai 1 ATGT 5 ' AGCAGATTCTCGAGGAAACCA 
tai 1 ATGG 5 ' ACCTCTCTGGTCTGGTCAGCA 
tai 1 ATGC 5 ' TGAC AAGTGGATGAGTGAGCAG 
tai 1 ATGA 5 ' GGATTTTTCGACCGTGGTACA 
tai 1 ATCT 5 ' GCCTGAGAGCTTTACTC ACCA 
tai 1 ATCG 5 ' GCTTAGCTTCTGCG ATGGCAC 
tai 1 ATCC 5 ' CAGCAGTGTCAGGTAGCCTCA 
tai 1 ATC A 5 ' AGACAAGAGGTTCTGGCACCA 
tai lATAT 5 ' TGGTGGGTCTATCAAGTCGCA 
t ai 1 ATAG 5 ' TGTCGTAGCC ACTGATGCTAC 
tailATAC 5 ' TCATCCCTGGCATCGATGCTC 
ta i 1 ATAA 5 ' GAGGTGCCTTCCCAGAC AGAG 
tai 1 AGTT 5 ' CGTCTCTGGAGTCGTCCTCTC 
tai lAGTG 5 ' TGGAGTCACGGTCTATGGATG 
tai lAGTC 5 ' AGTCTCCTGGAATGACGTGGAC 
tai 1 AGTA 5 ' CCAGTGTCCTCACCTAGATCG 
tai 1 AGGT 5 ' AGCCTACGCCAGTTGTCCTTC 
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t ai lAGGG 5 ' CCTTGTAGAGGATACGAACGAC 

t a i 1 AGGC 5 ' AGGTAGCACAGCC AGG AACTC 

t a i 1 AGG A 5 '. TCGTAC ACG ATCC ATC AGO AG 

tai lAGCT 5 ' GAACCCTCTGCCTTCGAACAC 

t a i 1 AGCG 5 ' CTCAACCTAGACCCCTTAAACC 

tai lAGCC 5 ' CTTAGCAACGTCCCAGAGGAG 

tai lAGCA 5 ' AGGAGATCACTGCGTCTGCTG 

tai 1 AGAT 5 ' CCAGCTGCTCACTTCATGCTC 

tai 1 AGAG 5 ' ACCAGTCTCTACTGAGGCCAG 

tai 1 AGAC 5 ' CTATTGCACTAGTGCCTGCCA 

tai 1 AGAA 5 ' TGCGGACACGACAGGATGTAG 

tai lACTT 5 ' CCAGTGCTACCTCAGi^TCCGT 

tai 1 ACTG 5 ' GAATCGAGCTGAGGCTTCTCA 

tai lACTC 5 ' CAGGCGAATTAACCTCAAACG 

tai 1 ACTA 5 ' GCTCGGGTATTTGCAGTAGCA 

tai lACGT 5 ' TGAGGAGTTACGTGCAGACGA 

tai lACGG 5 ' TGACAGTCGCTTGAACCATCC 

tai lACGC 5 ' ACAGACCACCAGCTGAGAGTG 

tai 1 ACGA 5 ' GTCC ATTCCCATC AACCAAGC • 

tai lACCT 5 ' GTACGTCTAGTCTTGCTTGCAG 

tailACCG 5 ' GACACTTGGGAGCTTCATGGA 

tailACCC 5 ' CCTGCGTTTAACCAATGTGCA 

tai 1 ACCA 5 ' ATCTACCTGCAATGATCTGCA 

tai lACAT 5 ' AGACCGTCTTCCAGTCGTGCT 

tai 1 ACAG 5 ' ACCACCGATGATGTTCATGCT 

tai lACAC 5 ' TCCACCACAGTCCAGACTCCA 

t ai 1 ACAA 5 ' GACGAGTCGACGAGGTGTAAG 

tai lAATT 5 ' GACCTACGGAAGCTTAGCCCT 

tai lAATG 5 ' ACACCACCGCAACTAGCCAAC 

tai lAATC 5 ' CGTTGTGCCTAAGACCTGCGA 

tai lAATA 5 ' GGAACCAGAATCGGACCTGAC 

tai lAAGT 5 ' TGGAGTTGATGGGTCGAGCTG 

tai lAAGG 5 ' GACAGCTATGTTGCCGGTAGC 

tai lAAGC 5 ' TO AGAGTGGC ACATACTGAGGA 

tai 1 AAGA 5 ' GATGGCACGTAGGCAAGCAAC 

tai lAACT 5 ' CTCTGTGCTTCGGGCCTAGTC 

tai 1 AACG 5 ' CGTATCACCTGTGTCC AGCAA 

tai lAACC 5 ' CTAACAACGGTGGCGTTCCA 

tai 1 AACA 5 ' TGCAACCTCGATCCCATACG 

tailAAAT 5 ' GTGAGGAGCTGATGAGACTGA 

tailAAAG 5 ' CGAACGGTTACGTCACCAAGG 

tai lAAAC 5 ' ACTTCAGTTCCTAGGCTCGTC 

tailAAAA 5 ' AGGTCTCCATCACGACTCCAC 
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5 ' GGTACGACGTTCAGCTNIIIAAA 
5 ' GGTACGACGTTC AGCTNI 1 1 AAC 
5 ' GGTACGACGTTCAGCTNIIIAAG 

5 ' GGTACGACGTTCAGCTNI 1 1 AAT 
5 ' GGTACGACGTTCAGCTNI I I AC A 
5 ' GGTACGACGTTCAGCTNI 1 1 ACC 
5 ' GGTACGACGTTCAGCTNI 11 ACG 
5' GGTACGACGTTCAGCTNIIIACT 

5 ' GGTACGACGTTCAGCTNI 1 1 AGA 

5 ' GGTACGACGTTCAGCTNI 1 1 AGC 

5 ' GGTACGACGTTCAGCTNI 1 1 AGG 

5 ' GGTACGACGTTCAGCTNI I I AGT 

5 ' GGTACGACGTTCAGCTNI 1 1 ATA 

5 ' GGTACGACGTTCAGCTNI II ATC 

5 ' GGTACGACGTTCAGCTNI I lATG 

5 ' GGTACGACGTTCAGCTNI 1 1 ATT 

5 ' GGTACGACGTTCAGCTNIIICAA 

5' GGTACGACGTTCAGCTNI lie AC 

5 ' GGTACGACGTTCAGCTNI IICAG 

5 ' GGTACGACGTTCAGCTNI IICAT 

5 ' GGTACGACGTTCAGCTNI I ICC A 

5 ' GGTACGACGTTCAGCTNI I ICCC 

5 ' GGTACGACGTTCAGCTNI IICCG 

5 ' GGTACGACGTTCAGCTNI IICCT 

5 ' GGTACGACGTTCAGCTNI I I CG A 

5 ' GGTACGACGTTCAGCTNI I ICGC 

5 ' GGTACGACGTTCAGCTNI I ICGG 

5 ' GGTACGACGTTCAGCim IICGT 

5 ' GGTACGACGTTCAGCTNI IICTA 

5' GGTACGACGTTCAGCTNI IICTC 

5 ' GGTACGACGTTCAGCTNI IICTG 

5 ' GGTACGACGTTCAGCTNIIICTT 

5 ' GGTACGACGTTCAGCTNI IIGAA 

5 ' GGTACGACGTTCAGCTNI I IGAC 

5 ' GGTACGACGTTCAGCTNI I IGAG 

5 ' GGTACGACGTTCAGCTNI I IGAT 

5 ' GGTACGACGTTCAGCTNIIIGCA 

5 ' GGTACGACGTTCAGCTNI I IGCC 

5 ' GGTACGACGTTCAGCTNI IIGCG 

5 ' GGTACGACGTTCAGCTNIIIGCT 

5 ' GGTACGACGTTCAGCTNIIIGGA 
5 ' GGTACGACGTTCAGCTNI I IGGC 
5 ' GGTACGACGTTCAGCTNI I IGGG 
5' GGTACGACGTTCAGCTNIIIGGT 



5 ' GGTACGACGTTCAGCTNI I IGTA 

5' GGTACGACGTTCAGCTNI IICTC* 

5 ' GGTACGACGTTCAGCTNI I IGTG 

5 ' GGTACGACGTTCAGCTNI IIGTT 

5 ' GGTACGACGTTCAGCTNI I ITAA 

5 ' GGTACGACGTTCAGCTNI I ITAC 

5 ' GGTACGACGTTCAGCTNI I ITAG 

5 ' GGTACGACGTTCAGCTNI 1 1 TAT 

5' GGTACGACGTTCAGCTNI 1 1 TC A 

5 ' GGTACGACGTTCAGCTNI IITCC 

5 ' GGTACGACGTTCAGCTNI I ITCG 

5 ' GGTACGACGTTCAGCTNI IITCT 

5 ' GGTACGACGTTCAGCTNI I ITG A 

5 ' GGTACGACGTTCAGCTNI I ITGC 

5 ' GGTACGACGTTCAGCTNI I ITGG 

5 ' GGTACGACGTTCAGCTNI I ITGT 

5 ' GGTACGACGTTCAGCTNI 1 1 TTA 

5' GGTACGACGTTCAGCTNI 1 1 TTC 

5 ' GGTACGACGTTCAGCTNI 1 1 TTG 

5' GGTACGACGTTCAGCTNI 1 1 TTT 
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5 ' GGTACGACGTTCAGCTJNJGGGIIIAAA 

5 ' GGTACGACGTTCAGCTNGGGIIIAAC 

5 ' GGTACGACGTTCAGCTNGGGIIIAAG 

5 ' GGTACGACGTTCAGCTNGGGIIIAAT 

5 ' GGTACGACGTTCAGCTMGGGIIIACA 

5 ' GGTACGACGTTCAGCTISrGGGIIIACC 

5 ' GGTACGACGTTCAGCTNGGGIIIACG 

5 ' GGTACGACGTTCAGCTNGGGXIIACT 

5 ' GGTACGACGTTCAGCTNGGGIIIAGA 

5' GGTACGACGTTCAGCTNGGGIIIAGC 

5 ' GGTACGACGTTCAGCTNGGGI 1 1 AGG 

5 ' GGTACGACGTTCAGCTNGGGI 1 1 AGT 

5 ' GGTACGACGTTCAGCTNGGGX I lATA 

5 ' GGTACGACGTTCAGCTNGGGI I lATC 

5 ' GGTACGACGTTCAGCTNGGGI IIATG 

5 ' GGTACGACGTTCAGCTNGGGI 1 1 ATT 

5 ' GGTACGACGTTCAGCTNGGGX I ICAA 

5 ' GGTACGACGTTCAGCTNGGGIXICAC 

5 ' GGTACGACGTTCAGCTNGGGXIXCAG 

5 ' GGTACGACGTTCAGCTNGGGI I ICAT 

5 ' GGTACGACGTTCAGCTNGGGI IICCA 

5 ' GGTACGACGTTCAGCTNGGGIIICCC 

5 ' GGTACGACGTTCAGCTNGGGI I ICCG 

5' GGTACGACGTTCAGCTNGGGI IICCT 

5 ' GGTACGACGTTCAGCTNGGGX X XCGA 

5 ' GGTACGACGTTCAGCTNGGGXXXCGC 

5 ' GGTACGACGTTCAGCTNGGGXIICGG 

5 ' GGTACGACGTTCAGCTNGGGIIICGT 

5' GGTACGACGTTCAGCTNGGGX I ICTA 

5' GGTACGACGTTCAGCTNGGGI I ICTC 

5' GGTACGACGTTCAGCTNGGGIIICTG 

5' GGTACGACGTTCAGCTNGGGIIICTT 

5 ' GGTACGACGTTCAGCTNGGGIIIGAA 
5 ' GGTACGACGTTCAGCTNGGGX I XGAC 
5 ' GGTACGACGTTCAGCTNGGGIXXGAG 
5 ' GGTACGACGTTCAGCTNGGGI X IGAT 
5 ' GGTACGACGTTCAGCTNGGGIIIGCA 
5 ' GGTACGACGTTCAGCTNGGGIIIGCC 
5 ' GGTACGACGTTCAGCTNGGGIIIGCG 
5 ' GGTACGACGTTCAGCTNGGGI IIGCT 

5 ' GGTACGACGTTCAGCXaSIGGGIIIGGA 

5 ' GGTACGACGTTCAGCTNGGGIIIGCC 

5 ' GGTACGACGTTCAGCTNGGGI I IGGG 

5' GGTACGACGTTCAGCTNGGGIXIGGT 



5 ' GGTACGACGTTCAGCTNGGGIIIGTA 

5 ' GGTACGACGTTCAGCTNGGGIIICTG 

5 ' GGTACGACGTTCAGCTNGGGIXIGTG 

5 ' GGTACGACGTTCAGCTNGGGI I IGTT 

5 ' GGTACGACGTTCAGCTNGGGII XTAA 

5 ' GGTACGACGTTCAGCTNGGGI I ITAC 

5 ' GGTACGACGTTCAGCTNGGGII ITAG 

5 ' GGTACGACGTTCAGCTNGGGI I ITAT 

5 ' GGTACGACGTTCAGCTNGGGI I ITCA 

5 ' GGTACGACGTTCAGCTNGGGI I ITCC 

5 ' GGTACGACGTTCAGCTNGGGIIITCG 

5 ' GGTACGACGTTCAGCTNGGGIXXTCT 

5 ' GGTACGACGTTCAGCTNGGGX X XTGA 
5 ' GGTACGACGTTCAGCTNGGGI I ITGC 
5 ' GGTACGACGTTC AGCTNGGG I I ITGG 
5 ' GGTACGACGTTCAGCTNGGGII ITGT 
5 ' GGTACGACGTTC AGCTNGGG I I ITTA 
5 ' GGTACGACGTTCAGCTNGGGIIITTC 
5 ' GGTACGACGTTCAGCTNGGGX 1 1 TTG 
5 ' GGTACGACGTTCAGCTNGGGI I ITTT 
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AdaTTTT 5 ' TTTTGCAGGTACGTCGTACCGCGGCCGCTCGTCGAACGAACACTGCGTT 
AdaTTTG 5 ' TTTGGCAGGTACGTCGTACCGCGGCCGCTACGTCTCGTTCCGCTCTGCA 
AdaTTTC 5 ' TTTCGCAGGTACGTCGTACCGCGGCCGCCTTCGCTAGACGACGTGCGCT 
AdaTTTA 5 ' TTTAGCAGGTACGTCGTACCGCGGCCGCTCGCTCGTACTCCGTCTCAGA 
AdaTTGT 5 ' TTGTGCAGGTACGTCGTACCGCGGCCGCATGCGCGAAAACGGTCACTGT 
AdaTTGG 5 ' TTGGGCAGGTACGTCGTACCGCGGCCGCCGCTTTTCGTCGTCCCAGAGT 

AdaTTGC 5 ' TTGCGCAGGTACGTCGTACCGCGGCCGCCTGGTAAACCCGGTTCGGTCA 
AdaTTGA 5 ' TTGAGCAGGTACGTCGTACCGCGGCCGCCGATAGTCCGAGACGTCTQGT 

AdaTTGT 5 ' TTCTGCAGGTACGTCGTACCGCGGCCGCCTATCGAACGGCACGTTGTTG 

AdaTTGG 5 ' TTCGGCAGGTACGTCGTACCGCGGCCGCCTACGACCGTTCGGGTGGTAC 

AdaTTGC 5 ' TTCCGCAGGTACGTCGTACCGCGGCCGCTCGTACGGTCGACGGTTGTGC 

AdaTTGA 5 ' TTCAGCAGGTACGTCGTACCGCGGCCGCCGTTAGACCTCGTCTCGGCTT 

AdaTTAT 5 ' TTATGCAGGTACGTCGTACCGCGGCCGCCTAACGTCCGATCGCAGCGAT 

AdaTTAG 5 ' TTAGGCAGGTACGTCGTACCGCGGCCGCCGGTATCGGAACGTCTGCGGT 

AdaTTAC 5 ' TTACGCAGGTACGTCGTACCGCGGCCGCTACCGAACCGTCGTACGTAGA 

AdaTTAA 5 ' TTAAGCAGGTACGTCGTACCGCGGCCGCCTAGCGGATTCACGTGGTGTA 

AdaTGTT 5 ' TGTTGCAGGTACGTCGTACCGCGGCCGCGTCGACGACTCTGTCCAGGAT 

AdaTGTG 5 ' TGTGGCAGGTACGTCGTACCGCGGCCGCTCGACGGTTCTTGACTCACGA 

AdaTGTC 5 " TGTCGCAGGTACGTCGTACCGCGGCCGCCGTAAGGATCACGTGTGCTGT 

AdaTGTA 5 ' TGTAGCAGGTACGTCGTACCGCGGCCGCCTTACGGATCGAGCGTGTACCA 

AdaTGGT 5 ' TGGTGCAGGTACGTCGTACCGCGGCCGCCATACGCAACGACCCGAGTGAG 

AdaTGGG 5 ' TGGGGCAGGTACGTCGTACCGCGGCCGCCGTATCCTTGCGTGTGTAATCC 

AdaTGGC 5 * TGGCGCAGGTACGTCGTACCGCGGCCGCATCGGAAGAAGCACGATGCCA 

AdaTGGA 5 ' TGGAGCAGGTACGTCGTACCGCGGCCGCCCGATTTCTCGCGAGGACGTC 

AdaTGCT 5 ' TGCTGCAGGTACGTCGTACCGCGGCCGCCGCATCCCACTGCTGGATACT 

AdaTGGG 5 • TGCGGCAGGTACGTCGTACCGCGGCCGCTACGGTTCGGTCGCTCTTCGT 

AdaTGCC 5 • TGCCGCAGGTACGTCGTACCGCGGCCGCCATTCGTCGAACCGCAGTTGC 

AdaTGCA 5 ' TGCAGCAGGTACGTCGTACCGCGGCCGCCGAATTTCGACTCGCGAACGT 

AdaTGAT 5 ' TGATGCAGGTACGTCGTACCGCGGCCGCAACGCGACGCAGTGACACGTT 

AdaTGAG 5 ' TGAGGCAGGTACGTCGTACCGCGGCCGCGCTTTGCTTCTCCGTCTAGAC 

AdaTGAC 5 ' TGACGCAGGTACGTCGTACCGCGGCCGCACGTAACGTCGAGCGCTAACG 

AdaTGAA 5 ' TGAAGCAGGTACGTCGTACCGCGGCCGCTACGTGACGTGCGGAGTGATC 

AdaTGTT 5 ' TCTTGCAGGTACGTCGTACCGCGGCCGCCGTAGACGCTCGGTAACTAGT 

AdaTCTG 5 ' TCTGGCAGGTACGTCGTACCGCGGCCGCCTACGAGCGTCTCTCGCATAG 

AdaTGTC 5 • TCTCGCAGGTACGTCGTACCGCGGCCGCGCGAAACGCATCCGTTCGTGT 

AdaTGTA 5 ' TCTAGCAGGTACGTCGTACCGCGGCCGCTTCGCTTCGTTGCTGCTAGTA 

AdaTGGT^ 5 * TCGTGCAGGTACGTCGTACCGCGGCCGCACGATCGACACCGGAGTCTAG 

AdaTGGG 5 ' TCGGGCAGGTACGTCGTACCGCGGCCGCATCGTTGTCGGGACGTAGTCG 

AdaTGGC 5 ' TCGCGCAGGTACGTCGTACCGCGGCCGCACGACCATCGTCTTCCGAGTT 

AdaTGGA 5 * TCGAGCAGGTACGTCGTACCGCGGCCGCGTCGTCGATGCGTCCATACTT 

AdaTGCT 5 * TCCTGCAGGTACGTCGTACCGCGGCCGCAACCGAACGTGTCGACCTTCA 

AdaTGGG 5 * TCCGGCAGGTACGTCGTACCGCGGCCGCTGGTTACGTTTGCGCGGTATT 

AdaTCCC 5 ' TCCCGCAGGTACGTCGTACCGCGGCCGCCGAACCCTGGTCCTGTGTCTA 

AdaTCCA 5 ' TCCAGCAGGTACGTCGTACCGCGGCCGCGTTCGCTCGTCACGAAGTACT 

AdaTCAT 5 ' TCATGCAGGTACGTCGTACCGCGGCCGCCGCAAACCGCAGCTTCTAGTT 

AdaTCAG 5 ' TCAGGCAGGTACGTCGTACCGCGGCCGCTTGCGACCGTTCGCAGCTAGT 

AdaTCAC 5 * TCACGCAGGTACGTCGTACCGCGGCCGCAGTCGCAGGTAAGCGTATGCT 

AdaTCAA 5 * TCAAGCAGGTACGTCGTACCGCGGCCGCCGACTATCGTAGGCTCCACGT 

AdaTATT 5 ' TATTGCAGGTACGTCGTACCGCGGCCGCATCGAGCGATTCGAGGTTAGG 

AdaTATG 5 * TATGGCAGGTACGTCGTACCGCGGCCGCTCGATACCGTAGCGCCGTGGT 

AdaTATC 5 * TATCGCAGGTACGTCGTACCGCGGCCGCACCGAACTCTCTCGACGGCAT 

AdaTATA! 5 ' TATAGCAGGTACGTCGTACCGCGGCCGCTCGATCGTCACTCGTGGTTGA 

AdaTAGT 5 ' TAGTGCAGGTACGTCGTACCGCGGCCGCGCGACTGACGACGAGGATAGT 

AdaTAGG 5 ' TAGGGCAGGTACGTCGTACCGCGGCCGCGTCGTGGCAGACGGATAACCT 

AdaTAGC 5 ' TAGCGCAGGTACGTCGTACCGCGGCCGCATGCGGTTCGTCAGTCGTGGA 

AdaTAGA 5 * TAGAGCAGGTACGTCGTACCGCGGCCGCCGTATCGATTCCGTCTAGCTC 
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AdaTACT 5 ' TACTGCAGGTACGTCGTACCGCGGCCGCACGAAGATCGACGGCTCCGTT 

AdaTACG 5 ' TACGGCAGGTACGTCGTACCGCGGCCGCTCGAACGAAGACCACGTACTG 

AdaTACC 5 ' TACCGCAGGTACGTCGTACCGCGGCCGCTGCTGACTTCGGCGGTGATCG 

AdaTACA 5 ' TACAGCAGGTACGTCGTACCGCGGCCGCGTTCGTAGACGCGAGTCTGAC 

AdaTAAT 5 • TAATGCAGGTACGTCGTACCGCGGCCGCCAATCGCGTTTACTCGAATGG 

AdaTAAG 5 ' TAAGGCAGGTACGTCGTACCGCGGCCGCCGATTCGGAACGAGCGACTAT 

AdaTAAC 5 ' TAACGCAGGTACGTCGTACCGCGGCCGCCGTTCTTCCTGGCTCTAAGGC 

AdaTAAA 5 ' TAAAGCAGGTACGTCGTACCGCGGCCGCGAACGCTAACGTGCGTGAACC 

AdaGTTT 5 ' GTTTGCAGGTACGTCGTACCGCGGCCGCGATGAGCCAGGGAAGGAATTGG 

AdaGTTG 5 ' GTTGGCAGGTACGTCGTACCGCGGCCGCGATTAGACGAGGCGACCGAGA 

AdaGTTC 5 ' GTTCGCAGGTACGTCGTACCGCGGCCGCGACTAGGTCAGCTGAGGAGTCT 

AdaGTTA 5 ' GTTAGCAGGTACGTCGTACCGCGGCCGCACTACGAGTGGCGAGCTGACT 

AdaGTGT 5 ' GTGTGCAGGTACGTCGTACCGCGGCCGCGCCTACTGACTCGAGTACTCT 

AdaGTGG 5 ' GTGGGCAGGTACGTCGTACCGCGGCCGC CGTTGTTCCCGACTCCTCTGT 

AdaGTGC 5 ' GTGGGCAGGTACGTCGTACCGCGGCCGC CAACGGACACAGGTACCCAAG 

AdaGTGA 5 ' GTGAGCAGGTACGTCGTACCGCGGCCGCTCACAGATTGCTTCGTACTGT 

AdaGTCT 5 ' GTCTGCAGGTACGTCGTACCGCGGCCGCCGTCCTTATGCTCTCCGAGCT 

AdaGTGG 5 ' GTCGGCAGGTACGTCGTACCGCGGCCGCCCTATCCAGTAATGCCCGAGA 

AdaGTCC 5 * GTCCGCAGGTACGTCGTACCGCGGCCGCCATGGGACAGATCAGGTTAAGG 

AdaGTCA 5 ' GTCAGCAGGTACGTCGTACCGCGGCCGCTCAGTCAAACTGGACTCGCAC 

AdaGTAT 5 ' GTATGCAGGTACGTCGTACCGCGGCCGCGAAGGCTGTGGTTGGCCACC 

AdaGTAG 5 ' GTAGGCAGGTACGTCGTACCGCGGCCGCGCAGTCGTACACCTCATCTCA 

AdaGTAC 5 • GTACGCAGGTACGTCGTACCGCGGCCGCTAGACAACTGGCGCATTGACA 

AdaGTAA 5 ' GTAAGCAGGTACGTCGTACCGCGGCCGCGATGCCACTAGGTGTTGGTGC 

AdaGGTT 5 ' GGTTGCAGGTACGTCGTACCGCGGCCGCAGACCTCCCGCTCTACAGATC 

AdaGGTG 5 ' GGTGGCAGGTACGTCGTACCGCGGCCGCGCGTGGCAGCACCCTTAGCCA 

AdaGGTC 5 ' GGTCGCAGGTACGTCGTACCGCGGCCGCAGGTTTCGGCTGGAGTCTCAT 

AdaGGTA 5 ' GGTAGCAGGTACGTCGTACCGCGGCCGCTCTGCAGGTCGTCCCTACACT 

AdaGGGT 5 * GGGTGCAGGTACGTCGTACCGCGGCCGCTGTATCAGCTATGCCGTTGCC 

AdaGGGG 5 * GGGGGCAGGTACGTCGTACCGCGGCCGCCGTTGCCTCATACCTCAGCATC 

AdaGGGC 5 ' GGGCGCAGGTACGTCGTACCGCGGCCGCAGCAGACAGGCCAAATGACGT 

AdaGGGA 5 * GGGAGCAGGTACGTCGTACCGCGGCCGCAGCAGTTCGAGCACGTGAGTC 

AdaGGCT 5 ' GGCTGCAGGTACGTCGTACCGCGGCCGCGAGATGTGCCACATGCCGACT 

AdaGGGG 5 ' GGCGGCAGGTACGTCGTACCGCGGCCGCGTTAGCGGCTGTCTACCGAGT 

AdaGGCC 5 ' GGCCGCAGGTACGTCGTACCGCGGCCGCGTGAATAGTGACCGTGTCCCAG 

AdaGGCA 5 * GGCAGCAGGTACGTCGTACCGCGGCCGCTGTGTACAGCGTTCCAAGGGT 

AdaGGAT 5 ' GGATGCAGGTACGTCGTACCGCGGCCGCACGTCTCACTACGTGTCCGGA 

AdaGGAG 5 * GGAGGCAGGTACGTCGTACCGCGGCCGCGCTAGGTAAGAGTGCAAGGCA 

AdaGGAC 5 ' GG ACGCAGGTACGTCGTACCGCGGCCGCCCAAGTGCACGATACTGGCTA 

AdaGGAA 5 * GGAAGCAGGTACGTCGTACCGCGGCCGCTTCGTGTAGGGTGGTAAGCTT 

AdaGGTT 5 ' GCTTGCAGGTACGTCGTACCGCGGCCGCTTCGACCCATGTCATCATCCT 

AdaGGTG 5 ' GCTGGCAGGTACGTCGTACCGCGGCCGCGGAGGACTTGTCATGGAGGTT 

AdaGCTC 5 ' GCTCGCAGGTACGTCGTACCGCGGCCGCAGATGTCTGTCCCACGGTGTT 

AdaGCTA 5 * GCTAGCAGGTACGTCGTACCGCGGCCGCAATGCCCTGTATGTTCCGTGG 

AdaGCGT 5 ' GCGTGCAGGTACGTCGTACCGCGGCCGCATGCTTAGCTCCACGCTCATG 

AdaGCGG 5 ' GCGGGCAGGTACGTCGTACCGCGGCCGCCCTCGTACCTTGTGACAGATG 

AdaGCGC 5 ' GCGCGCAGGTACGTCGTACCGCGGCCGCTGTGCGAGCGTTCCATCTCCT 

AdaGCGA 5 ' GCGAGCAGGTACGTCGTACCGCGGCCGCGATGCTGGTCGAGGACACAGA 

AdaGCCT 5 * GCCTGCAGGTACGTCGTACCGCGGCCGCAACCAGCTTCCACCTGGAGTT 

AdaGCCG 5 ' GCCGGCAGGTACGTCGTACCGCGGCCGCGTGACGTTCGATGTGAGTCTG 

AdaGCCC 5 ' GCCCGCAGGTACGTCGTACCGCGGCCGCGGTGTCCCATTCGGAGTTACA 

AdaGCGA 5 ' GCCAGCAGGTACGTCGTACCGCGGCCGCCAGGTGAGGGAGAGCATCAAC 

AdaGCAT 5 ' GCATGCAGGTACGTCGTACCGCGGCCGCCTCGACCTTGTTGGCAGACTC 

AdaGCAG 5 ' GCAGGCAGGTACGTCGTACCGCGGCCGCTGCATTGCGGTTCCTCACAAC 

AdaGCAC 5 ' GCACGCAGGTACGTCGTACCGCGGCCGCCTGAGAGCTGTTCACTGAGGT 

AdaGGAA 5 ' GCAAGCAGGTACGTCGTACCGCGGCCGCCAGTGGATAGCGACCTGGATC 
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5 ' GATTGCAGGTACGTCGTACCGCGGCCGCTGGATTTGAGATCGCATGTGG 
5 ' GATGGCAGGTACGTCGTACCGCGGCCGCGCGTCTAGGTCGTCACGACAA 
5 ' GATCGCAGGTACGTCGTACCGCGGCCGCCCCGATGATTAGACGCCTCAA 
5 ' GATAGCAGGTACGTCGTACCGCGGCCGCGATAGTGGCGATTGCTGAGCG 
5 ' GAGTGCAGGTACGTCGTACCGCGGCCGCGGCGGCTCATGTGTGATAATG 
5 * GAGGGCAGGTACGTCGTACCGCGGCCGCGGTGGTTTTCTCTTGCCCCTC 
5 • GAGCGCAGGTACGTCGTACCGCGGCCGCGGAAGACAGCTCGCTGGACTT 
5 ' GAGAGCAGGTACGTCGTACCGCGGCCGCGACCTTACTGAGAAGCGGCCT 



AdaGACT 5 

AdaGAGG 5 

AdaGAGC 5 

AdaGACA 5 

AdaGAAT 5 

AdaGAAG 5 

AdaGAAC 5 

AdaGAAA 5 

AdaCTTT 5 

AdaCTTg 5 

AdaCTTc 5 

AdaCTTa 5 

AdaCTGT 5 

AdaCTGg 5 

AdaCTGc 5 

AdaCTGa 5 

AdaCTCT 5 

AdaCTCg 5 

AdaCTCc 5 

AdaCTCa 5 

AdaCTAT 5 

AdaCTAg 5 

AdaCTAc 5 

AdaCTAa 5 

AdaCGTT 5 

AdaCGTg 5 

AdaCGTc 5 

AdaCGTa 5 

AdaCGGT 5 

AdaCGGg 5 

AdaCGGc 5 

AdaCGGa 5 

AdaCGCT 5 

AdaCGCg 5 

AdaCGCc 5 

AdaCGCa 5 

AdaCGAT 5 

AdaCGAg 5 

AdaCGAc 5 

AdaCGAa 5 



GACTGCAGGTACGTCGTACCGCGGCCGCTGTGGGGTTCTCTGCGTACAC 
GACGGCAGGTACGTCGTACCGCGGCCGCGGAACTGTTGTCCAGGAGACC 
GACCGCAGGTACGTCGTACCGCGGCCGCGATGCCAGAGTGATGCAACTG 
GACAGCAGGTACGTCGTACCGCGGCCGCCTCATTTCGCGATTCGGTCTT 
GAATGCAGGTACGTCGTACCGCGGCCGCTGTGAACCGGGTGGTCTGAAC 
GAAGGCAGGTACGTCGTACCGCGGCCGCCCAGAGGATCCTGCTGTAGCA 
GAACGCAGGTACGTCGTACCGCGGCCGCGTTGCTGCCGGTCTAGGTATC 
GAAAGCAGGTACGTCGTACCGCGGCCGCGTGGGTTTCCTAGCTCTCAGTG 

CTTTGCAGGTACGTCGTACCGCGGCCGCGACGGCATCAGGATTTATCCC 
CTTgGCAGGTACGTCGTACCGCGGCCGCGACAGGCAAGGGTTGAGACTG 
CTTcGCAGGTACGTCGTACCGCGGCCGCGTGCTTAGGTGAGCTCCGTGA 
CTTaGCAGGTACGTCGTACCGCGGCCGCGCATCAGGTCAGCTCCAAATC 
CTGTdCAGGTACGTCGTACCGCGGCCGCACTCGATTTCATAGATACATC 
CTGgGCAGGTACGTCGTACCGCGGCCGCCTGAACGGAGTTACGGGGTTG 
CTGcGCAGGTACGTCGTACCGCGGCCGCCGAAGGTCGCACAAGTCGACG 
eTGaGCAGGTACGTCGTACCGCGGCCGCCATGACCTGGTTGTGCGTGTT 

CTCTGCAGGTACGTCGTACCGCGGCCGCGAGTCCAGTAGCTGGAGACGA 
CTCgGCAGGTACGTCGTACCGCGGCCGCCGTCTGTAAGTGTTGAGCGTA 
CTCcGCAGGTACGTCGTACCGCGGCCGCGTATAGTAGGTGCTGTTGCCC 
CTCaGCAGGTACGTCGTACCGCGGCCGCGAGTGGAAGACTGGTCAGACG 
CTATGCAGGTACGTCGTACCGCGGCCGCCGACTGGAAAACACCTCTCCC 
CTAgGCAGGTACGTCGTACCGCGGCCGCTTCGCGACGACTACTTGGGTC 
CTAcGCAGGTACGTCGTACCGCGGCCGCGGAGCCTGATTCACCATGGTG 
CTAaGCAGGTACGTCGTACCGCGGCCGCTCGACCTTCCCACACTCAGGT 

CGTTGCAGGTACGTCGTACCGCGGCCGCCTTCCGACAGACAGTTTCGCA 

CGTgGCAGGTACGTCGTACCGCGGCCGCCTGCTTGAGGATTGCCAAAGC 

CGTcGCAGGTACGTCGTACCGCGGCCGCTGTTCGATGAGTCAGGAGCGA 

CGTaGCAGGTACGTCGTACCGCGGCCGCGTCGAGAOXSGTACCGACTCTG 

CGGTGCAGGTACGTCGTACCGCGGCCGCGATCAACATATCCTTTGTCCGC 

CGGgGCAGGTACGTCGTACCGCGGCCGCCTTCCGACAAAAGGTCCTAGTG 

CGGcGCAGGTACGTCGTACCGCGGCCGCCTCACGCTAGCACCGCTCTTA 

CGGaGCAGGTACGTCGTACCGCGGCCGCACCAACGAATCTCGAGGCTCC 

CGCTGCAGGTACGTCGTACCGCGGCCGCGTCCATGCTGAAAGACCAGGC 

CGCgGCAGGTACGTCGTACCGCGGCCGCGACAGACGTTCGGCTGACAAG 

CGC cGC AGGTACGTCGTACCGCGGCCGC CTGTTATCCGCCTTGC AGCGT 

CGCaGCAGGTACGTCGTACCGCGGCCGCTGAGGCACCTGTCTATGTGCTG 

CGATGCAGGTACGTCGTACCGCGGCCGCCCCTTGGTGCAACGTGATGAT 

CGAgGCAGGTACGTCGTACCGCGGCCGCAGGGTCGTACGTTCCTCTGGA 

CGAcGCAGGTACGTCGTACCGCGGCCGCTGCTCGCTCTGTCTCCTGTTC 

CGAaGCAGGTACGTCGTACCGCGGCCGCTCCACGTCGTACTGACCGTAG 



AdaCCTT 5 * CCTTGCAGGTACGTCGTACCGCGGCCGCCAGCGCTCCAGCGAATAATTT 

AdaCCTg 5 * CCTgGCAGGTACGTCGTACCGCGGCCGCCTGTATGCTACACCGCAGCTG 

AdaCCTc 5 ' CCTcGCAGGTACGTCGTACCGCGGCCGCGCGTCTGGAACGATTACGAGT 

AdaCCTa 5 ' CCTaGCAGGTACGTCGTACCGCGGCCGCCTTTCCGGCCATAACACGTAT 

AdaCcgT 5 ' CcgTGCAGGTACGTCGTACCGCGGCCGCTTCCCAACCTAACTTCGGAGC 

AdaCcgg 5 ' CcggGCAGGTACGTCGTACCGCGGCCGCGCTATCGTTGCAAGGGTGAAC 

AdaCcgc 5 ' CcgcGCAGGTACGTCGTACCGCGGCCGCATCCGAGTAGGGAGTCTCCCT 

AdaCcga 5 * CcgaGCAGGTACGTCGTACCGCGGCCGCCTGGTACATGTCTGGCAACTC 
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AdaCcCt 5 

AdaCcCg 5 

AdaCcCc 5 

AdaCcCa 5 

AdaCcAt 5 

AdaCcAg 5 

AdaCcAc 5 

AdaCcAa 5 

AdaCaTT 5 

AdaCaTg 5 

AdaCaTc 5 

AdaCaTa 5 

AdaCagT 5 

AdaCagg 5 

AdaCagc 5 

AdaCaga 5 

AdaCaCT 5 

AdaCaCg 5 

AdaCaCc 5 

AdaCaCa 5 

AdaCaAT 5 

AdaCaAg 5 

AdaCaAc 5 

AdaCaAa 5 

AdaATTT 5 

AdaATTG 5 

AdaATTC 5 

AdaATTA 5 

AdaATGT 5 

AdaATGG 5 

AdaATGC 5 

AdaATGA 5 



AdaATCT 5 

AdaATCG 5 

AdaATCC 5 

AdaATCA 5 

AdaATAT 5 

AdaATAG 5 

AdaATAC 5 

AdaATAA 5 

AdaAGTT 5 

AdaAGTG 5 

AdaAGTC 5 

AdaAGTA 5 

AdaAGGT 5 

AdaAGGG 5 

AdaAGGC 5 

AdaAGGA 5 



CcCtGCAGGTACGTCGTACCGCGGCCGCTGCTTGTGGGAAGAAACTGGC 
CcCgGCAGGTACGTCGTACCGCGGCCGCCTGGGTCGCATACTCGTTCAC 
CcCcGCAGGTACGTCGTACCGCGGCCGCTAGGCGTTGCAATACAGGCAA 

CcCaGCAGGTACGTCGTACCGCGGCCGCTCCTGATCTTCCAGCAGCTCA 

CcAtGCAGGTACGTCGTACCGCGGCCGCTCATGTTGCGTATTCCCCTACT 

CcAgGCAGGTACGTCGTACCGCGGCCGCGGTCACTCCTCGAAGTGGATC 

CcAcGCAGGTACGTCGTACCGCGGCCGCCGTGTCGAGCGAATGTGGTAC 

CcAaGCAGGTACGTCGTACCGCGGCCGCTGCCAATTCGAGAGGGAAATG 

CaTTGCAGGTACGTCGTACCGCGGCCGCACATCTCGGCGATACATCGGA 

CaTgGCAGGTACGTCGTACCGCGGCCGCTGACCTTCCTTCTCAGTTGGT 

CaTcGCAGGTACGTCGTACCGCGGCCGCGAGTACTGGTGACTAGGATTCG 

CaTaGCAGGTACGTCGTACCGCGGCCGCTCGGTAGGAGTGCATCCTTCC 

CagTGCAGGTACGTCGTACCGCGGCCGCGTGGTGAGGGAGGGAGCTATT 

CaggGCAGGTACGTCGTACCGCGGCCGCGAGATGTAGCAGATGGTCCTC 

CagcGCAGGTACGTCGTACCGCGGCCGCGATTAGGACCCGCGAAGTAAT 

CagaGCAGGTACGTCGTACCGCGGCCGCCATAGCCTTTTGTTGTCGCTG 

CaCTGCAGGTACGTCGTACCGCGGCCGCCTAGTGATGAGGTGTCAACGC 

CaCgGCAGGTACGTCGTACCGCGGCCGCGTTCGGACGaTGAGTGGTAGA 

CaCcGCAGGTACGTCGTACCGCGGCCGCGCACTTCCTCAGAAGCATGCT 

CaCaGCAGGTACGTCGTACCGCGGCCGCTGTAACACAAGCCACGATGACT 

CaATGCAGGTACGTCGTACCGCGGCCGCTGGGACCCATAGCCAAGTGTC 

CaAgGCAGGTACGTCGTACCGCGGCCGCTTGGAGCTCTCACGTACTGTG 

CaAcGCAGGTACGTCGTACCGCGGCCGCCTGGTCCTGTTGGGTTGCTTC 

CaAaGCAGGTACGTCGTACCGCGGCCGCTGCTGCTTCCTGGTGAGTCTCT 

ATTTGCAGGTACGTCGTACCGCGGCCGCTCAGCCTTCTGCTGTACCACA 

ATTGGCAGGTACGTCGTACCGCGGCCGCTCCTGCTTTGGCGAACTTGGA 

ATTCGCAGGTACGTCGTACCGCGGCCGCGAAGCATTCCAGAATCGCACG 

ATTAGCAGGTACGTCGTACCGCGGCCGCTCCTCTCCCACCATTCCGAGT 

ATGTGCAGGTACGTCGTACCGCGGCCGCTGGTTTCCTCGAGAATCTGCT 

ATGGGCAGGTACGTCGTACCGCGGCCGCTGCTGACCAGACCAGAGAGGT 

ATGCGCAGGTACGTCGTACCGCGGCCGCCTGCTCACTCATCCACTTGTCA 

ATGAGCAGGTACGTCGTACCGCGGCCGCTGTACCACGGTCGAAAAATCC 

ATCTGCAGGTACGTCGTACCGCGGCCGCTGGTGAGTAAAGCTCTCAGGC 
ATCGGCAGGTACGTCGTACCGCGGCCGCGTGCCATCGCAGAAGCTAAGC 
ATCCGCAGGTACGTCGTACCGCGGCCGCTGAGGCTACCTGACACTGCTG 
ATCAGCAGGTACGTCGTACCGCGGCCGCTGGTGCCAGAACCTCTTGTCT 
ATATGCAGGTACGTCGTACCGCGGCCGCTGCGACTTGATAGACCCACCA 
ATAGGCAGGTACGTCGTACCGCGGCCGCGTAGCATCAGTGGCTACGACA 
ATACGCAGGTACGTCGTACCGCGGCCGCGAGCATCGATGCCAGGGATGA 
ATAAGCAGGTACGTCGTACCGCGGCCGCCTCTGTCTGGGAAGGCACCTC 

AGTTGCAGGTACGTCGTACCGCGGCCGCGAGAGGACGACTCCAGAGACG 

AGTGGCAGGTACGTCGTACCGCGGCCGCCATCCATAGACCGTGACTCCA 

AGTCGCAGGTACGTCGTACCGCGGCCGCGTCCACGTCATTCCAGGAGACT 

AGTAGCAGGTACGTCGTACCGCGGCCGCCGATCTAGGTGAGGACACTGG 

AGGTGCAGGTACGTCGTACCGCGGCCGCGAAGGACAACTGGCGTAGGCT 

AGGGGCAGGTACGTCGTACCGCGGCCGCGTCGTTCGTATCCTCTACAAGG 

AGGCGCAGGTACGTCGTACCGCGGCCGCGAGTTCCTGGCTGTGCTACCT 

AGGAGCAGGTACGTCGTACCGCGGCCGCCTGCTGATGGATCGTGTACGA 
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AdaAGCT 5 
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AdaACTG 5 

AdaACTC 5 

AdaACTA 5 

AdaACGT 5 
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AdaACGT 5 

AdaACGG 5 
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AdaAGAA 5 



AGCTGCAGGTACGTCGTACCGCGGCCGCGTGTTCGAAGGCAGAGGGTTC 

AGCGGCAGGTACGTCGTACCGCGGCCGCGGTTTAAGGGGTCTAGGTTGAG 

AGCCGCAGGTACGTCGTACCGCGGCCGCCTCCTCTGGGACGTTGCTAAG 

AGCAGCAGGTACGTCGTACCGCGGCCGCCAGCAGACGCAGTGATCTCCT 

AGATGCAGGTACGTCGTACCGCGGCCGCGAGCATGAAGTGAGCAGCTGG 

AGAGGCAGGTACGTCGTACCGCGGCCGCCTGGCCTCAGTAGAGACTGGT 

AGACGCAGGTACGTCGTACCGCGGCCGCTGGCAGGCACTAGTGCAATAG 

AGAAGCAGGTACGTCGTACCGCGGCCGCCTACATCCTGTCGTGTCCGCA 

ACTTGCAGGTACGTCGTACCGCGGCCGCACGGATCTGAGGTAGCACTGG 
ACTGGCAGGTACGTCGTACCGCGGCCGCTGAGAAGCCTCAGCTCGATTC 
ACTCGCAGGTACGTCGTACCGCGGCCGCCGTTTGAGGTTAATTCGCCTG 
ACTAGCAGGTACGTCGTACCGCGGCCGCTGCTACTGCAAATACCCGAGC 
ACGTGCAGGTACGTCGTACCGCGGCCGCTCGTCTGCACGTAACTCCTCA 
ACGGGCAGGTACGTCGTACCGCGGCCGCGGATGGTTCAAGCGACTGTCA 
ACGCGCAGGTACGTCGTACCGCGGCCGCCACTCTCAGCTGGTGGTCTGT 
ACGAGCAGGTACGTCGTACCGCGGCCGCGCTTGGTTGATGGGAATGGAC 

ACCTGCAGGTACGTCGTACCGCGGCCGCCTGCAAGCAAGACTAGACGTAC 

ACCGGCAGGTACGTCGTACCGCGGCCGCTCCATGAAGCTCCCAAGTGTC 

ACCCGCAGGTACGTCGTACCGCGGCCGCTGCACATTGGTTAAACGCAGG 

ACCAGCAGGTACGTCGTACCGCGGCCGCTGCAGATCATTGCAGGTAGAT 

ACATGCAGGTACGTCGTACCGCGGCCGCAGCACGACTGGAAGACGGTeT 

ACAGGCAGGTACGTCGTACCGCGGCCGCAGCATGAACATCATCGGTGGT 

ACACGCAGGTACGTCGTACCGCGGCCGCTGGAGTCTGGACTGTGGTGGA 

ACAAGCAGGTACGTCGTACCGCGGCCGCCTTACACCTCGTCGACTCGTC 



AdaAATT 5 

AdaAATG 5 

AdaAATC 5 

AdaAATA 5 

AdaAAGT 5 

AdaAAGG 5 

AdaAAGC 5 

AdaAAGA .5 

AdaAAGT 5 

AdaAAGG 5 

AdaAAGC 5 

AdaAAGA 5 

AdaAAAT 5 

AdaAAAG 5 

AdaAAAC 5 

AdaAAAA 5 



AATTGCAGGTACGTCGTACCGCGGCCGCAGGGCTAAGCTTCCGTAGGTC 

AATGGCAGGTACGTCGTACCGCGGCCGCGTTGGCTAGTTGCGGTGGTGT 

AATCGCAGGTACGTCGTACCGCGGCCGCTCGCAGGTCTTAGGCACAACG 

AATAGCAGGTACGTCGTACCGCGGCCGCGTCAGGTCCGATTCTGGTTCC 

AAGTGCAGGTACGTCGTACCGCGGCCGCCAGCTCGACCCATCAACTCCA. 

AAGGGCAGGTACGTCGTACCGCGGCCGCGCTACCGGCAACATAGCTGTC 

AAGCGCAGGTACGTCGTACCGCGGCCGCTCCTCAGTATGTGCCACTCTGA 

AAGAGCAGGTACGTCGTACCGCGGCCGCGTTGCTTGCCTACGTGCCATC 

AACTGCAGGTACGTCGTACCGCGGCCGCGACTAGGCCCGAAGCACAGAG 

AACGGCAGGTACGTCGTACCGCGGCCGCTTGCTGGACACAGGTGATACG 

AACCGCAGGTACGTCGTACCGCGGCCGCTGGAACGCCACCGTTGTTAG 

AACAGCAGGTACGTCGTACCGCGGCCGCCGTATGGGATCGAGGTTGCA 

AAATGCAGGTACGTCGTACCGCGGCCGCTCAGTCTCATCAGCTCCTCAC 

AAAGGCAGGTACGTCGTACCGCGGCCGCCCTTGGTGACGTAACCGTTCG 

AAACGCAGGTAGGTCGTACCGCGGCCGCGACGAGCCTAGGAACTGAAGT 

AAAAGCAGGTACGTCGTACCGCGGCCGCGTGGAGTCGTGATGGAGACCT 
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5' Ph-TTTNGCAGGTACGTCGTACC GCGGCCGC GTGAGCTTGAGTCGGGTGGA 
5' Pli-TTGNGCAGGTACGTCGTACC GCGGCCGC CCAACGTCGCGAGTTAGTCAG 
5' Ph-TTCNGCAGGTACGTCGTACC GCGGCCGC AGGTAGACGCGGTATGTTCGTA 
5^ Ph-TTANGCAGGTACGTCGTACC GCGGCCGC CGGTGCTAGAGTCGCGTGTT 
5' Ph-TGTNGCAGGTACGTCGTACC GCGGCCGC CGACAGTACCGCGACAGCTA 
5' Ph-TGGNGCAGGTACGTCGTACC GCGGCCGC GCACTTAACTACGCCGACGAAG 
5' Ph-TGCNGCAGGTACGTCGTACC GCGGCCGC gTACTAGCCTAACCGAGGCGTA 
5' Ph-TGANGCAGGTACGTCGTACC GCGGCCGC TO GG ATC AC GTACACGTGCT 

5/ Ph-TCTNGCAGGTACGTCGTACC GCGGCCGC GTACGTCGCCTAGTCGACCTG 

5' Pli-TCGNGCAGGTACGTCGTACC GCGGCCGC cTCTCCTAACGGACCGACTAAC 

5 ' Pli-TCCNGCAGGTACGTCGTACC GCGGCCG.C CGTTCCGATCTAGCGGTATCTT 

5' Pli-TCANGCAGGTACGTCGTACC GCGGCCGC gcACCCGTACaGGATGTGAG 

5' Ph-TATNGCAGGTACGTCGTACC GCGGCCGC GCAACGCGCTATGCTCGTag 

5' Pli-TAGNGCAGGTACGTCGTACC GCGGCCGC GACTgTGGAACTACGACGATCg 

5' Ph-TACNGCAGGTACGTCGTACC GCGGCCGC aGCaGACCGAACCCTAGTCGC 

5' Ph-TAANGCAGGTACGTCGTACC GCGGCCGC cATACGTCGTAgggTTCGCGA 

5' Ph-GTTNGCAGGTACGTCGTACC GCGGCCGC ctCTCATACGCGTCTGCGCGT 

5^ Ph.-GTGNGCAGGTACGTCGTACC GCGGCCGC gAGTgTGCCTTACGTCGAGttc 

5' Ph.-GTCNGCAGGTACGTCGTACC GCGGCCGC GTcACGTtGCGGCCTTAGTC 

5' Ph-GTANGCAGGTACGTCGTACC GCGGCCGC GagGTACGAgACTTGACACACG 

5' Pli-GGTNGCAGGTACGTCGTACC GCGGCCGC GACcAGttGCCTAACGGAcACT 

5' Ph-GGGNGCAGGTACGTCGTACC GCGGCCGC GCAACTAGTCTCGACCTGCGA 

5' Pli-GGCNGCAGGTACGTCGTACC GCGGCCGC GTACCTCGACGACCGTACTGTg 

5' Pli-GGANGCAGGTACGTCGTACC GCGGCCGC ACGCGTGATAGTAGGGAGTCG 

5' Pii-GCTNGCAGGTACGTCGTACC GCGGCCGC CACTAGAGCGGCGTCAGTCTA 

5' Pii-GCGNGCAGGTACGTCGTACC GCGGCCGC GCACAGCGCTAGCACAGGA 

5' Ph-GCCNGCAGGTACGTCGTACC GCGGCCGC TACCGACAGTCCTCTGCGTGC 

5' Pti-GCANGCAGGTACGTCGTACC GCGGCCGC CTACGCTACGTTGCGAAGAAGGTA 

5' Pli-GATNGCAGGTACGTCGTACC GCGGCCGC GTCTGTCGTACCTGTCAGTGACTg 

5' PI1--GAGNGCAGGTACGTCGTACC GCGGCCGC ATCGAACCGTGCTCCTTGG 

5' Ph-GACNGCAGGTACGTCGTACC GCGGCCGC AGGTTGAGGTGTACGCGATAGC 

5' Ph.-GA3^GCAGGTACGTCGTACC GCGGCCGC GACTTcAACCCCTGACGTACACa 

5^ Pli-CTTNGCAGGTACGTCGTACC GCGGCCGC CTACTCGCGAGAGAGGGCTATG 

5' Pli-CTGNGCAGGTACGTCGTACC GCGGCCGC CTTGATCCGTAGTCGAGACGG 

5' Pli-CTCNGCAGGTACGTCGTACC GCGGCCGC GTACAGACGTAGCGATCGCaG 

5' Ph-CTANGCAGGTACGTCGTACC GCGGCCGC gTGACTAACGAGGTCTGTAAGCTa 

5' Pli-CGTNGCAGGTACGTCGTACC GCGGCCGC GTCTgAGAGTCGACTgCGCTAAG 

5' Ph-CGGNGCAGGTACGTCGTACC GCGGCCGC CTcAGTAAGCCGGAGTCTAGCTAg 

5' Ph-CGCNGCAGGTACGTCGTACC GCGGCCGC CGCCCTAAACGGGATCGAGCGA 

5' Pli-CGANGCAGGTACGTCGTACC GCGGCCGC CGTACAGGCTAGGGGTTAGTCG 
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